S675 class notes

From enfascination

Jump to: navigation, search
(Discriminant Coordinates)
(Discriminant Coordinates)
Line 43: Line 43:
 
Then asubi=qsubi are he discriminant coordinates for the data matrices Xsub1,...,Xsubg.
 
Then asubi=qsubi are he discriminant coordinates for the data matrices Xsub1,...,Xsubg.
  
Remarks: r<=min(g-1,p) [usuallyg-1] This means that the number of discriminant coordinates is limited.  In practice, however, we useually use 2-3 coordinates.  Even when the assumption of equal covariances Psubi = Normal(musubi,Sigma) is flawed, discriminant coordinates often provide an effective representation of the data.
+
Remarks: r<=min(g-1,p) [usuallyg-1] This means that the number of discriminant coordinates is limited.  In practice, however, we useually use 2-3 coordinates.  Even when the assumption of equal covariances Psubi = Normal(musubi,Sigma) is flawed, discriminant coordinates often provide an effective representation of the data.  LDA is optimal with equal variance, but still works otherwise.

Revision as of 15:01, 26 October 2009

10/26/09

remarks on previous

  1. Fisher's best linear discriminator constructs a linear combination of the variables that is usually good for discrimination. This linear comination is not necessarily the linear combination found by PCA
  2. The rule htat assigns an unlabelled u to the class label i for which (u-xbarsubi)^tS^-1(u - xbarsubi) is minimal bas an obvious extension from the case of Z classes (i in {1,2}) to the case of g classes (i in {1,2,...,g}). The general rule is called linear discirminant analysis (LDA).
  3. LDA relies on a pooled sample covariance matrix S.
    • Let Sigma = ES. where E is the expectation.
    • It may or maynot be the case that eash class has population covariance matrix Sigma.
    • What if the classes have different covariance matrices, Sigmasub1, ... , Sigmasubg?
    • One possibility: estimate reach Sigmasubi by Ssubi, then assign to u the label i for which (u-xbarsubi)^tSsubi^-1(u-xbarsubi) is minimal.
    • This rule is called quadratic discriminant analysis (QDA)

Discriminant Coordinates

Again (for motivation) assume that Psubi=Normal(musubi,Sigma). (common covariance, different means)

Let xbarsubi and Ssubi denote the sample mean vector and sample covariance matrix for sample i in {1,2,...,g}..

Let xbar = Sum of i=1 to g (nsubi/n)xbarsubi, <--sample grand mean vector

W = (n-g)^-1 Sum from i=1 to g (nsubi-1)Ssubi <---pooled withingroups sample covariance matrix

B = (g-1)^-1 Sum from i=1 to g nsubi(xbarsubi-xbar)(xbarsubi-xbar)^t <---between-groups sample covariance matrix


Given a in Real^p, we perform a univariate analysis of variance to test the nullhypothesis Hsub0:a^tmusub1=a^tmusub2=...=a^tmusubg.

In ANOVA, generalization of T statistic is F statistic. In T, statistic comparise difference between observed and predicted mean with variance withint he sample

F(a) = (ration between the variation between gorupand variation within groups) (a^tBa)/(a^tWa) (between group variance of the linear combination)/(within group variance of the linear combination).

Large values of F(a) are evience against the null hypothesis, i.e., that atleast some of the gorups are different.

Let asubone maximize F(a).

Among a UpsidedownT asubone, let asub2 maximize F(a). Among a UpsidedownT asubone, asubtwo, let asub3 maximize F(a).

Note that this is similar to interpretation of the eigenvectors of the covariance matrix in PCA, except instead of the covariance matrix, we are decomposing the eigenvectors of the matrix from the ratio in F(a)

Theorem: Let lambdasub1 >= ... >=labmdasubr > 0 denote the strictly positive eigenvalues of W^-1B and let qsub1,...,qsubr denote the corresponding eigenvectors.

Then asubi=qsubi are he discriminant coordinates for the data matrices Xsub1,...,Xsubg.

Remarks: r<=min(g-1,p) [usuallyg-1] This means that the number of discriminant coordinates is limited. In practice, however, we useually use 2-3 coordinates. Even when the assumption of equal covariances Psubi = Normal(musubi,Sigma) is flawed, discriminant coordinates often provide an effective representation of the data. LDA is optimal with equal variance, but still works otherwise.