Linear Discriminant Analysis (binary)

Item	LDA (binary-class form)
Generative model	Each class is modeled by a multivariate Gaussian with its own mean but the same full covariance matrix
$\Sigma:p(\bold x	C=c)=\mathcal N(\bold x;\mu_c,\Sigma),\ c\in\{0,1\}$
Training objective	Maximum-likelihood (ML). With $n_c$ samples per class
$\hat\mu_c=\frac{1}{n_c}\sum_{i\in c}\bold x_i,\\ \hat\Sigma=\frac{1}{N}\sum_c\sum_{i\in c}(\bold x_i-\hat\mu_c)(\bold x_i-\hat\mu_c)^\top$
Discriminant/inference	Log-posterior ratio is linear:
$\log\frac{p(C=1	\bold x)}{p(C=0\bold x)}=\bold w^\top\bold x+b$ with
$\bold w=\Sigma^{-1}(\mu_1-\mu_0)$ and
$b=-\frac{1}{2}(\mu_1+\mu_0)^\top\bold w+\log\frac{\pi_1}{\pi_0}$. Decision rule: “predict 1 if $\bold w^\top\bold x>t$”.
Fisher view	The same classifier is obtained by the discriminative Fisher criterion: maximise $J(\bold w)=\frac{\bold w^\top S_B\bold w}{\bold w^\top S_W\bold w}$ (between vs. within scatter) giving the same direction $\bold w$.

Tied-MVG (binary) classifier

Item	Tied MVG
Model assumptions	Identical to the generative LDA above: two Gaussians that share a full covariance (”tied”).
Training objective	The very same ML estimates for $\mu_c$ and the pooled covariance $\Sigma$.
Inference	Likelihood-ratio test
$\Lambda(\bold x)=\log\frac{\mathcal N(\bold x;\mu_1,\Sigma)}{\mathcal(\bold x;\mu_0,\Sigma)}=\bold w^\top\bold x+b$, hence the decision rule is again linear and coincides with that of LDA.
Decision function	Same $\bold w, b$ as above; only the threshold moves when application priors/costs change.

For two classes the two procedures are mathematically identical:

The tied MVG derives the linear discriminant from ML estimation of a shared-covariance Gaussian model.
Fisher-LDA (or “projection + threshold”) yields the same weight vector up to a positive scaling factor. Thus they give the same ranking score and differ at most by the threshold used to incorporate priors or costs.

$$ s(\bold x)=\bold w^\top\bold x+b,\ \bold w=\Sigma^{-1}(\mu_1-\mu_0),\ b=-\frac{1}{2}(\mu_1-\mu_0)^\top\bold w+\log\frac{\pi_1}{\pi_0} $$

Predict class 1 if $s(\bold x)>\log\frac{C_{10}\pi_0}{C_{01}\pi_1}$ (Bayes-optimal threshold).

Objective - find a projection $W\in\R^{d\times(K-1)}$ (at most $K-1$ columns) that maximises

$$ J(W)=\frac{\det(W^\top S_BW)}{\det W^\top S_W W}\propto \text{tr}\left((W^\top S_WW)^{-1}W^\top S_BW\right) $$

where $S_B$ and $S_W$ are between - and within - class scatter matrices. The solution is given by the top $K-1$ eigenvectors of $S_W^{-1}S_B$.
Limitations:
- Rank ≤ $K-1$: cannot extract more than $K-1$ discriminant directions, even if original dimension is much larger.
- Requires $S_W$ invertible; with high-dimensional, few-sample data the within scatter is often singular (”small-sample problem”).
- Optimal only under homoscedastic Gaussian assumption; performance degrades when class covariances differ or classes are not Gaussian.
- Being linear, it cannot separate classes that are linearly inseparable without additional feature mapping.