Analisi delle Componenti Principali (PCA) e Analisi Discriminante Lineare (LDA)

Problem restatement

Compare PCA and LDA with respect to:

  1. Their goals and mathematical formulation.
  2. The optimisation criterion used in training.
  3. The properties of the resulting projection directions.
  4. Their role inside a classifier.

1. Goals and basic formulation

PCA LDA (Fisher’s Linear Discriminant
Learning paradigm Unsupervised Supervised
Main goal Find an m-dimensional linear sub-space that preserves as much variance of the data as possible (information compression/noise removal) Find k - 1 (≤ m) directions that maximize class separability by spreading the class means while keeping each class compact.
Data model Only the centered data matrix $X=[x_1, ..., x_k]$. Same data plus class labels $C\in\{1, ..., k\}$
Projection $y=P^\top(x-\bar x),\\ P\in \R^{n\times m},\ P^\top P=I$ $y=W^\top x,\\ W=[w_1, ..., w_{k-1}]\in\R^{n\times (k-1)}$

2. Training objective

PCA

Minimise the average squared reconstruction error:

$$ \underset{P^\top P=I}{\min}\frac{1}{K}\sum^K_{i=1}||x_i-PP^\top x_i||^2 $$

which is equivalent to maximising the retained variance:

$$ \underset{P\top P=I}{\max}\text{Tr}\left(P^\top CP\right),\ C=\frac{1}{K}\sum_i(x_i-\bar x)(x_i-\bar x)^\top $$

The maximiser is obtained by taking the m eigenvectors of $C$ with the largest eigen-values.

LDA

Maximise the (generalised) Rayleigh quotient:

$$ \underset{w\ne 0}{\max}J(w)=\frac{w^\top S_Bw}{w^T S_Ww},\ S_B=\sum_cn_c(\mu_c-\mu)(\mu_c-\mu)^\top,\\ S_W=\sum_c\sum_{i\in c}(x_i-\mu_c)(x_i-\mu_c)^\top $$

Solving the eigen problem: