Analisi delle Componenti Principali (PCA) e Analisi Discriminante Lineare (LDA)
Compare PCA and LDA with respect to:
| PCA | LDA (Fisher’s Linear Discriminant | |
|---|---|---|
| Learning paradigm | Unsupervised | Supervised |
| Main goal | Find an m-dimensional linear sub-space that preserves as much variance of the data as possible (information compression/noise removal) | Find k - 1 (≤ m) directions that maximize class separability by spreading the class means while keeping each class compact. |
| Data model | Only the centered data matrix $X=[x_1, ..., x_k]$. | Same data plus class labels $C\in\{1, ..., k\}$ |
| Projection | $y=P^\top(x-\bar x),\\ P\in \R^{n\times m},\ P^\top P=I$ | $y=W^\top x,\\ W=[w_1, ..., w_{k-1}]\in\R^{n\times (k-1)}$ |
Minimise the average squared reconstruction error:
$$ \underset{P^\top P=I}{\min}\frac{1}{K}\sum^K_{i=1}||x_i-PP^\top x_i||^2 $$
which is equivalent to maximising the retained variance:
$$ \underset{P\top P=I}{\max}\text{Tr}\left(P^\top CP\right),\ C=\frac{1}{K}\sum_i(x_i-\bar x)(x_i-\bar x)^\top $$
The maximiser is obtained by taking the m eigenvectors of $C$ with the largest eigen-values.
Maximise the (generalised) Rayleigh quotient:
$$ \underset{w\ne 0}{\max}J(w)=\frac{w^\top S_Bw}{w^T S_Ww},\ S_B=\sum_cn_c(\mu_c-\mu)(\mu_c-\mu)^\top,\\ S_W=\sum_c\sum_{i\in c}(x_i-\mu_c)(x_i-\mu_c)^\top $$
Solving the eigen problem: