Professor’s solution

import numpy

def load_iris():
    import sklearn.datasets
    return sklearn.datasets.load_iris()['data'].T, sklearn.datasets.load_iris()['target']

def split_db_2to1(D, L, seed = 0):

    nTrain = int(D.shape[1] * 2.0 / 3.0)
    numpy.random.seed(seed)
    idx = numpy.random.permutation(D.shape[1]) # Random permutation of indices in the range [0, D.shape[1]]
    idxTrain = idx[0:nTrain]
    idxTest = idx[nTrain:]

    DTR = D[:, idxTrain]
    DVAL = D[:, idxTest]
    LTR = L[idxTest]
    LVAL = L[idxTest]

    return (DTR, LTR), (DVAL, LVAL)

def vcol(x):
    return x.reshape((x.size, 1))

def compute_mu_C(D):
    mu = vcol(D.mean(1))
    C = ((D - mu) @ (D - mu).T) / float(D.shape[1])
    return mu, C

# Compute a dictionary of ML paramters for each class
def Gau_MVG_ML_estimates(D, L):
    labelSet = set(L)
    hParams = {}
    for lab in labelSet:
        DX = D[:, L == lab]
        hParams[lab] = compute_mu_C(DX)
    return hParams

if __name__ == '__main__':
    DIris, LIris = load_iris()

    (DTR, LTR), (DVAL, LVAL) = split_db_2to1(DIris, LIris)

    # Multivariate Gaussian Models
    hParams_MVG = Gau_MVG_ML_estimates(DTR, LTR) # Compute model parameters
    for lab in [0, 1, 2]:
        print("MVG - Class", lab)
        print(hParams_MVG[lab][0])
        print(hParams_MVG[lab][1])
        print()

Dataset loading (load_iris)
```
def load_iris():
    import sklearn.datasets
    return sklearn.datasets.load_iris()['data'].T, sklearn.datasets.load_iris()['target']
```
- This loads the Iris dataset from sklearn.datasets.
- load_iris()[’data’] is shape (150, 4), where each row is a sample and each columns is a feature. The code transposes it, so the returned data D has shape (4, 150) (features x samples).
- load_iris()[’target’] is the array of labels of length 150.

Dataset split (split_db_2to1)

def split_db_2to1(D, L, seed = 0):

    nTrain = int(D.shape[1] * 2.0 / 3.0)
    numpy.random.seed(seed)
    idx = numpy.random.permutation(D.shape[1]) # Random permutation of indices in the range [0, D.shape[1]]
    idxTrain = idx[0:nTrain]
    idxTest = idx[nTrain:]

    DTR = D[:, idxTrain]
    DVAL = D[:, idxTest]
    LTR = L[idxTest]
    LVAL = L[idxTest]

    return (DTR, LTR), (DVAL, LVAL)

Sets a random seed for reproducibility.
Creates a random permutation of the indices $[0,1,...,149]$.
Splits out the first $nTrain$ indices (2/3 of 150 = 100) into the training partition.
The remaining 50 indices go into the validation partition.
DTR/LTR = training data/labels, DVAL/LVAL = validation data/labels.

Helper function to reshape a vector to a column (vcol)
```
def vcol(x):
    return x.reshape((x.size, 1))
```
- If x has shape (N,), vcol(x) changes it to (N, 1).
- This is just a utility function to ensure mean vectors become column vectors.
Computing mean and covariance given data from one class (compute_mu_C)
```
def compute_mu_C(D):
    mu = vcol(D.mean(1))
    C = ((D - mu) @ (D - mu).T) / float(D.shape[1])
    return mu, C
```
- D.shape here is (4, N_c) for the class’s data. That is, 4 features by $N_c$ samples belonging to the class.
- D.mean(1) computes the mean across the columns, returning a 1D array of length 4. Then vcol reshapes it into a $(4,1)$ column vector.
- Subtraction: (D - mu) is effectively done column by column.
- ((D - mu) @ (D - mu).T) is the sum-of-outer-products matrix, giving the unnormalized covariance. Dividing by the number of samples D.shape[1] yields the empirical covariance matrix.
- Returns both mu and C.
Estimate parameters for each class (Gau_MVG_ML_estimates)
```
def Gau_MVG_ML_estimates(D, L):
    labelSet = set(L)
    hParams = {}
    for lab in labelSet:
        DX = D[:, L == lab]
        hParams[lab] = compute_mu_C(DX)
    return hParams
```
- Extracts the unique labels (there are three for the Iris dataset: 0, 1, 2).
- For each label lab, extracts only the columns of D corresponding to that label (DX).
- Computes the mean and covariance for those samples (compute_mu_C(DX)).
- Stores them in a dictionary hParams using lab as the key.
  - For instance: hParams[0] = (mu_0, Sigma_0), etc.

Putting it all together

if __name__ == '__main__':
    DIris, LIris = load_iris()

    (DTR, LTR), (DVAL, LVAL) = split_db_2to1(DIris, LIris)

    # Multivariate Gaussian Models
    hParams_MVG = Gau_MVG_ML_estimates(DTR, LTR) # Compute model parameters
    for lab in [0, 1, 2]:
        print("MVG - Class", lab)
        print(hParams_MVG[lab][0])
        print(hParams_MVG[lab][1])
        print()

Loads and splits the data.
Estimates the ML mean and covariance for each class on the training set.
Prints them out in the console:
- hParams_MVG[lab][0] is $\bold{\mu}_{lab}$.
- hParams_MVG[lab][1] is $\bold{\Sigma}_{lab}$.