S675 class notes

From enfascination

Jump to: navigation, search

Contents

10/26/09

remarks on previous

  1. Fisher's best linear discriminator constructs a linear combination of the variables that is usually good for discrimination. This linear comination is not necessarily the linear combination found by PCA
  2. The rule htat assigns an unlabelled u to the class label i for which (u-xbarsubi)^tS^-1(u - xbarsubi) is minimal bas an obvious extension from the case of Z classes (i in {1,2}) to the case of g classes (i in {1,2,...,g}). The general rule is called linear discirminant analysis (LDA).
  3. LDA relies on a pooled sample covariance matrix S.
    • Let Sigma = ES. where E is the expectation.
    • It may or maynot be the case that eash class has population covariance matrix Sigma.
    • What if the classes have different covariance matrices, Sigmasub1, ... , Sigmasubg?
    • One possibility: estimate reach Sigmasubi by Ssubi, then assign to u the label i for which (u-xbarsubi)^tSsubi^-1(u-xbarsubi) is minimal.
    • This rule is called quadratic discriminant analysis (QDA)

Discriminant Coordinates

Again (for motivation) assume that Psubi=Normal(musubi,Sigma). (common covariance, different means)

Let xbarsubi and Ssubi denote the sample mean vector and sample covariance matrix for sample i in {1,2,...,g}..

Let xbar = Sum of i=1 to g (nsubi/n)xbarsubi, <--sample grand mean vector

W = (n-g)^-1 Sum from i=1 to g (nsubi-1)Ssubi <---pooled withingroups sample covariance matrix

B = (g-1)^-1 Sum from i=1 to g nsubi(xbarsubi-xbar)(xbarsubi-xbar)^t <---between-groups sample covariance matrix


Given a in Real^p, we perform a univariate analysis of variance to test the nullhypothesis Hsub0:a^tmusub1=a^tmusub2=...=a^tmusubg.

In ANOVA, generalization of T statistic is F statistic. In T, statistic comparise difference between observed and predicted mean with variance withint he sample

F(a) = (ration between the variation between gorupand variation within groups) (a^tBa)/(a^tWa) (between group variance of the linear combination)/(within group variance of the linear combination).

Large values of F(a) are evience against the null hypothesis, i.e., that atleast some of the gorups are different.

Let asubone maximize F(a).

Among a UpsidedownT asubone, let asub2 maximize F(a). Among a UpsidedownT asubone, asubtwo, let asub3 maximize F(a).

Note that this is similar to interpretation of the eigenvectors of the covariance matrix in PCA, except instead of the covariance matrix, we are decomposing the eigenvectors of the matrix from the ratio in F(a)

Theorem: Let lambdasub1 >= ... >=labmdasubr > 0 denote the strictly positive eigenvalues of W^-1B and let qsub1,...,qsubr denote the corresponding eigenvectors.

Then asubi=qsubi are he discriminant coordinates for the data matrices Xsub1,...,Xsubg.

Remarks: r<=min(g-1,p) [usuallyg-1] This means that the number of discriminant coordinates is limited. In practice, however, we useually use 2-3 coordinates. Even when the assumption of equal covariances Psubi = Normal(musubi,Sigma) is flawed, discriminant coordinates often provide an effective representation of the data. LDA is optimal with equal variance, but still works otherwise.


10/28/09

LDA vs QDA

assume two groups normally distributed with different means are variances Seber 1984 in Multivariate Observations

if the two covariances are near each other and p<= 6 than LDA is about as good as QDA

if nsub1,nsub2<25 covarience differences and or p are large, than LDA outperforms QDA

If Nsub1 and nsub2 are large, even for large covariance differences and p>6, than QDA will outperform LDA.

(rare) rules of thumb for sample size:

-p!nsub1==nsub2!
25
30 75 100