Notes on Analysing Experiments in R
From enfascination
I've been inferring my knowledge of statistical analysis from the attacks that Douglas Bates unleashes on non-statisticians who need p-values in the language R. It gets religious because the innocents are bolstered by the SAS mixed-effect implementations that do give p-values. In the most well known email, Bates show obvious emotional restraint explaining why that value is not sensible.
Contents |
good conversations
- dragon hunting (short)
- the most referenced (and now reified) discussion[1]
- R^2 in mixed models
- unf. I couldn't find any of the empassioned email pleas for "good enough" p-values in R's mixed models. They have an important enough role in the mailing list ecology, and they make a good enough point. "Someone" should add a flag that gives SAS-like output. But I'm picking up on something. It seems that if you know enough about mixed effect models to make the necessary changes in R's code, you have too much integrity to do so.
documentation
- to google anything about stuff in R, "r stuff" won't work. Try "r cran stuff" or "r help stuff"
- http://cran.r-project.org/web/packages/lme4/vignettes/Implementation.pdf
- http://cran.r-project.org/web/packages/lme4/vignettes/
- http://cran.r-project.org/web/packages/lme4/
- article by Bates in rnews
- A Talmudic document that Bates refers a lot of ppl to
- Jose C. Pinheiro and Douglas M. Bates (2000), “Mixed-Effects Models in S and S-Plus”. Springer, ISBN 0-387-98957-0.
conversation snippets
- """There is now an anova()
method for lmer() and lmer2() fits performed using method="ML". You can compare different models and get p-values for p-value obsessed journals using this approach.[2]"""
- This is Bate's answer to people who already know the answer: "With lmer fits I recommend checking a Markov chain Monte Carlo sample from the posterior distribtuion of the parameters to determine which are signification (although this is not terribly well documented at the present time).[3]"
- similarly """Try using mcmcsamp() to sample from the posterior distribution of the
parameter estimates. You can calculate a p-value from that, if that is your desire."""[4]
- More for people who already know the answer [5]
- What I've been doing (limit: works well asymptotically with amount of data): """My general advice to those who are required to produce a p-value for a
particular fixed-effects term in a mixed-effects model is to use a likelihood ratio test. Fit the model including that term using maximum likelihood (i.e. REML = FALSE), fit it again without the term and compare the results using anova. The likelihood ratio statistic will be compared to a chi-squared distribution to get a p-value and this process is somewhat suspect when the degrees of freedom would be small. However, so many other things could be going wrong when you are fitting complex models to few observations that this may be the least of your worries."""[6]
code snippets
- keep in mind that I'm just a dilettante. I've used these, but that doesn't imply that they are appropriate, or correct
### for two lmer fits lmerout.basic and lmerout.null ### only use this on models that differ by one fixed effect. the smaller (or closer to null) model should ### be the second one. If your goal is to find significant variables in the system, your procedure is ### to start at null model or one with only the controlled variables and incrementally add in ### dependent/invented explanatory variables . test.lm<- function(lmerout1, lmerout2) { pchisq(as.numeric(2*(logLik(lmerout1)-logLik(lmerout2)), lower=FALSE))} test.lm(lmout.basic, lmout.null) ### alternatively (and probably preferable) (and, still, preferable with only one variable difference at a time) anova(lmerout.basic, lmerout.null)