Analisi delle Componenti Principali (PCA) e Analisi Discriminante Lineare (LDA)

Problem restatement

Compare PCA and LDA with respect to:

Their goals and mathematical formulation.
The optimisation criterion used in training.
The properties of the resulting projection directions.
Their role inside a classifier.

1. Goals and basic formulation

	PCA	LDA (Fisher’s Linear Discriminant
Learning paradigm	Unsupervised	Supervised
Main goal	Find an m-dimensional linear sub-space that preserves as much variance of the data as possible (information compression/noise removal)	Find k - 1 (≤ m) directions that maximize class separability by spreading the class means while keeping each class compact.
Data model	Only the centered data matrix $X=[x_1, ..., x_k]$.	Same data plus class labels $C\in\{1, ..., k\}$
Projection	$y=P^\top(x-\bar x),\\ P\in \R^{n\times m},\ P^\top P=I$	$y=W^\top x,\\ W=[w_1, ..., w_{k-1}]\in\R^{n\times (k-1)}$

2. Training objective

PCA

Minimise the average squared reconstruction error:

$$ \underset{P^\top P=I}{\min}\frac{1}{K}\sum^K_{i=1}||x_i-PP^\top x_i||^2 $$

which is equivalent to maximising the retained variance:

$$ \underset{P\top P=I}{\max}\text{Tr}\left(P^\top CP\right),\ C=\frac{1}{K}\sum_i(x_i-\bar x)(x_i-\bar x)^\top $$

The maximiser is obtained by taking the m eigenvectors of $C$ with the largest eigen-values.

LDA

Maximise the (generalised) Rayleigh quotient:

$$ \underset{w\ne 0}{\max}J(w)=\frac{w^\top S_Bw}{w^T S_Ww},\ S_B=\sum_cn_c(\mu_c-\mu)(\mu_c-\mu)^\top,\\ S_W=\sum_c\sum_{i\in c}(x_i-\mu_c)(x_i-\mu_c)^\top $$

Solving the eigen problem: