Notes on Analysing Experiments in R
From enfascination
I've been inferring my knowledge of statistical analysis from the more and less patient attacks that Douglas Bates, the author of R's mixed effects package, unleashes on non-statisticians who need p-values in the language R. It turns out that, for complex enough designs, experimentalists are at the cutting edge of statistics, and a lot isn't know. Specifically, this may be relevant to you if you use random effects (e.g. continuous-ish covariates), or nested experimental designs (e.g. involving phrases like "within subject" and "repeated measure"), a mix of ordinal/categorical and continuous variables and other non-vanilla flavors of analysis of variance.
It gets religious because the professionals know how meaningless the practitioner's favorite number is (the p-value) and the practitioners are bolstered by the fact that SAS gives just those numbers (apparently SAS mixed-effect implementations give p-values). In the most well known email, Bates show obvious emotional restraint explaining why it is not sensible to expect p-values generally from mixed effect models. He also offers model constraints under which a p-value can be a meaningful statistic.
But there still isn't enough information available for people who aren't already experts. (The "documentation" section, below, gives all the information you need if you do already know enough to know the right way to do it). I'm trying to collect the useful things I've learned into one spot, mostly for my own sake.
Contents |
good conversations
- The quickest way to summon a God is to anger It (short)
- the most referenced (and now reified) discussion of significance testing in mixed models[1]
- more on R^2 in mixed models
- unf. I couldn't find any of the empassioned email pleas for "good enough" p-values in R's mixed models. They have an important enough role in the mailing list ecology, and they make a good enough point: "R should be useful to non-experts. R is open source and can be made by anyone to serve the masses. 'Someone' should add a flag that makes R give SAS-like output (p-values)," But I'm picking up on something. It seems that everyone who knows enough about mixed effect models to make the necessary changes in R's code, has too much integrity to do so. Warning sign.
documentation
most of these references are only everything-you-need if you already know what you are doing. Otherwise you will have to supplement them with Wikipedia and the other things I've found. They are still worth reading. Osmosis is a general enough phenomenon that it works on even really inscrutable subjects, even when you have no idea what is going on. You just have to stay awake. My problem is staying awake.
- to google anything about stuff in R, "r stuff" won't work. Try "r cran stuff" or "r help stuff"
- useful walk through the usefulness of mixed models, examples in R more general usefulness too [2]
- http://cran.r-project.org/web/packages/lme4/vignettes/Implementation.pdf
- http://cran.r-project.org/web/packages/lme4/vignettes/
- http://cran.r-project.org/web/packages/lme4/
- An article by Bates in rnews
- The Exegesis: A Talmudic document that Bates refers a lot of ppl to
- $$$: Jose C. Pinheiro and Douglas M. Bates (2000), “Mixed-Effects Models in S and S-Plus”. Springer, ISBN 0-387-98957-0.
- $$$: Faraway's "Extending the Linear Model with R"
- $$$: Demidenko, "Mixed Models" more general, theoretical treatment
conversation snippets
- """There is now an anova() method for lmer() and lmer2() fits performed using method="ML". You can compare different models and get p-values for p-value obsessed journals using this approach.""" [3]
- This is Bate's answer to people who already know the answer: """With lmer fits I recommend checking a Markov chain Monte Carlo sample from the posterior distribtuion of the parameters to determine which are signification (although this is not terribly well documented at the present time).""" [4]
- similarly """Try using mcmcsamp() to sample from the posterior distribution of the parameter estimates. You can calculate a p-value from that, if that is your desire."""[5]
- More for people who already know the answer [6]
- What I've been doing (snippet below. limit: only really works with infinite data): """My general advice to those who are required to produce a p-value for a particular fixed-effects term in a mixed-effects model is to use a likelihood ratio test. Fit the model including that term using maximum likelihood (i.e. REML = FALSE), fit it again without the term and compare the results using anova. The likelihood ratio statistic will be compared to a chi-squared distribution to get a p-value and this process is somewhat suspect when the degrees of freedom would be small. However, so many other things could be going wrong when you are fitting complex models to few observations that this may be the least of your worries."""[7]
code snippets
- keep in mind that I'm just a dilettante. I've used these, but that doesn't imply that they are appropriate, or correct
### for two lmer fits lmerout.basic and lmerout.null ### only use this on models that differ by one fixed effect. the smaller (or closer to null) model should ### be the second one. If your goal is to find significant variables in the system, your procedure is ### to start at null model or one with only the controlled variables and incrementally add in ### dependent/invented explanatory variables . test.lm<- function(lmerout1, lmerout2) { pchisq(as.numeric(2*(logLik(lmerout1)-logLik(lmerout2)), lower=FALSE))} test.lm(lmout.basic, lmout.null) ### alternatively (and probably preferable) (and, still, preferable with only one variable difference at a time) anova(lmerout.basic, lmerout.null)
appendix if the starting point of this discussion is too advanced
- All the way back: R is a programming language for statistical analysis. It is free in both the bird and pocketbook senses. It is almost identical to a very expensive program called S. It is also very good, and in some ways it may already have eclipsed S.
- Less far back: A t-test is a very simple test of whether two lists of numbers are not really different. Comparing samples gets way more complicated. Analysis of variance (ANOVA) is a step on the way, and is the most popular manifestation of The Linear Model. Besides coursework, the best way that I know to get from knowing nothing to understanding this post is at sportsci.org . It has the most thorough free resource that I've found for explaining it all to dummies [8]. I read the whole thing.
- in R, aov() works most of the time. If not, lm() should work. If not, things get tangled and the learning curve gets worse. There is glm(), lme(), nlme(), and lmer() (of which glmer(), nlmer() and lmer2() are variants), all used with anova() or glht() or MCMC/bootstrapping techniques involving pvals.func(), merMCMC-class(), mcmcsamp() or your own code. Relevant libraries are nlme, lme4, languageR, and multcomp. This page refers mostly to the use of lmer() in the package lme4 linked in the documentation section.
- Hypothesis testing gets hard after lm(). You can't just type
summary(yourlm)
like with lm(). You have to either sample the model and infer test statistics from the simulated results (bootstrapping) or calculate the log likelihood ratio of the model against a null model*. You do this by writing your own test, using the one above, or just typinganova(yourlm, yourotherlm)
. Any of these could not-work, depending on the complexity of the model.
* ideally one that is the same-but-for-one-factor (opaquely, the distribution of the log of the square of the ratio of two models is approximately chi-square, and I think it errs conservative. sameness-but-for-one makes df=1 which is the easiest to interpret in this context)