| Item | LDA (binary-class form) |
|---|---|
| Generative model | Each class is modeled by a multivariate Gaussian with its own mean but the same full covariance matrix |
| $\Sigma:p(\bold x | C=c)=\mathcal N(\bold x;\mu_c,\Sigma),\ c\in\{0,1\}$ |
| Training objective | Maximum-likelihood (ML). With $n_c$ samples per class |
| $\hat\mu_c=\frac{1}{n_c}\sum_{i\in c}\bold x_i,\\ \hat\Sigma=\frac{1}{N}\sum_c\sum_{i\in c}(\bold x_i-\hat\mu_c)(\bold x_i-\hat\mu_c)^\top$ | |
| Discriminant/inference | Log-posterior ratio is linear: |
| $\log\frac{p(C=1 | \bold x)}{p(C=0\bold x)}=\bold w^\top\bold x+b$ with |
| $\bold w=\Sigma^{-1}(\mu_1-\mu_0)$ and | |
| $b=-\frac{1}{2}(\mu_1+\mu_0)^\top\bold w+\log\frac{\pi_1}{\pi_0}$. Decision rule: “predict 1 if $\bold w^\top\bold x>t$”. | |
| Fisher view | The same classifier is obtained by the discriminative Fisher criterion: maximise $J(\bold w)=\frac{\bold w^\top S_B\bold w}{\bold w^\top S_W\bold w}$ (between vs. within scatter) giving the same direction $\bold w$. |
| Item | Tied MVG |
|---|---|
| Model assumptions | Identical to the generative LDA above: two Gaussians that share a full covariance (”tied”). |
| Training objective | The very same ML estimates for $\mu_c$ and the pooled covariance $\Sigma$. |
| Inference | Likelihood-ratio test |
| $\Lambda(\bold x)=\log\frac{\mathcal N(\bold x;\mu_1,\Sigma)}{\mathcal(\bold x;\mu_0,\Sigma)}=\bold w^\top\bold x+b$, hence the decision rule is again linear and coincides with that of LDA. | |
| Decision function | Same $\bold w, b$ as above; only the threshold moves when application priors/costs change. |
For two classes the two procedures are mathematically identical:
$$ s(\bold x)=\bold w^\top\bold x+b,\ \bold w=\Sigma^{-1}(\mu_1-\mu_0),\ b=-\frac{1}{2}(\mu_1-\mu_0)^\top\bold w+\log\frac{\pi_1}{\pi_0} $$
Predict class 1 if $s(\bold x)>\log\frac{C_{10}\pi_0}{C_{01}\pi_1}$ (Bayes-optimal threshold).
Objective - find a projection $W\in\R^{d\times(K-1)}$ (at most $K-1$ columns) that maximises
$$ J(W)=\frac{\det(W^\top S_BW)}{\det W^\top S_W W}\propto \text{tr}\left((W^\top S_WW)^{-1}W^\top S_BW\right) $$
where $S_B$ and $S_W$ are between - and within - class scatter matrices. The solution is given by the top $K-1$ eigenvectors of $S_W^{-1}S_B$.
Limitations: