Preliminary Definitions

Dataset and Class Labels

Suppose you have:

For Iris, $d=4$ (sepal length, sepal width, petal length, petal width) and $K=3$ classes (Setosa, Versicolor, Virginica).


Within-Class and Between-Class Covariance

We recall the two main matrices in LDA:

  1. Between-Class Covariance $S_B$.
  2. Within-Class Covariance $S_W$.

They are defined (in normalized form) as follows:

$$ S_B=\frac{1}{N}\sum_{c=1}^Kn_c(\mu_c-\mu)(\mu_c-\mu)^T $$

$$ S_W=\frac{1}{N}\sum_{c=1}^K\sum_{i=1}^{n_c}(x_{c,i}-\mu_c)(x_{c,i}-\mu_c)^T $$

where:

Computing Class Means