Notes on Analysing Experiments in R

Revision as of 13:54, 24 April 2012

NOTE: I wrote this in 2010 and touched it up recently in 2012. I can use bootstrapping now, but I still prefer to use the very simple
anova(mymodel, mymodelminusoneparameter)
. There is one especially exciting approach, completely different, still very technical, that I'm not at all facile with, called Bayesian analysis.

I've been inferring my knowledge of statistical analysis from the more and less patient attacks that Douglas Bates unleashes on non-statisticians who need p-values in the language R. Douglas Bates is the author of R's mixed effects package, which lets you do tricky things. It turns out that, for complex enough designs, experimentalists are at the cutting edge of statistics, and a lot isn't known. Specifically, this article may be relevant to you if you use random effects (e.g. continuous-ish covariates), or nested experimental designs (e.g. involving phrases like "within subject" and "repeated measure"), a mix of ordinal/categorical and continuous variables and other non-vanilla flavors of analysis of variance.

It gets religious because the professionals are only too aware of the meaninglessness of p-value's, which most practitioners function to swear by. The practitioners want R's mixed-effects to give p-values, and they are bolstered by the fact that SAS gives just those numbers (apparently SAS mixed-effect implementations give p-values). In the most well-known email, Bates show obvious emotional restraint explaining why it is not sensible to expect p-values generally from mixed effect models. He also offers model constraints under which a p-value can be a meaningful statistic.

But there still isn't enough information available for people who aren't already experts. (The "Documentation" section, below, gives all the information you need if you do already know enough to know the right way to do it). I'm trying to collect the useful things I've learned into one spot, mostly for my own sake.

Good conversations on the threads

The quickest way to summon a God is to anger it (short)
The most referenced (and now reified) discussion of significance testing in mixed models [1]
More on R^2 in mixed models
unf. I couldn't find any of the great empassioned email pleas for p-values in R's mixed models that are "good enough." They have an important enough role in the mailing list ecology, and they make a good enough point: "R should be useful to non-experts. R is open source and can be made by anyone to serve the masses. 'Someone' should add a flag that makes R give SAS-like output (p-values)," But I'm picking up on something. It seems that everyone who knows enough about mixed effect models to implement the necessary changes has too much integrity to do so. After all, this discussion has been going on for years. You might take that as a warning sign.

Documentation

Most of these references are only everything-you-need if you already know what you are doing (two years after writing this I still don't follow everything). Otherwise you will have to supplement them with Wikipedia and the other things I've found. They are still worth reading. Osmosis is a general enough phenomenon that it works on even really inscrutable subjects, even when you have no idea what is going on. You just have to stay awake. My problem is staying awake.

To find anything about "foo" in R, Googling "r foo" won't work, because "r" is just a letter of the alphabet. Try "r foo help" or "r foo cran."
Useful walk through why mixed models are handy, with examples in R; also [2].
A few slides on lmer, a simple thing I made. More to document my confusion, but might help you with lmer syntax.
http://cran.r-project.org/web/packages/lme4/vignettes/Implementation.pdf
http://cran.r-project.org/web/packages/lme4/vignettes/
http://cran.r-project.org/web/packages/lme4/
An article by Bates in Rnews
The Exegesis: A Talmudic document that Bates refers a lot of ppl to
$$$: Jose C. Pinheiro and Douglas M. Bates (2000), “Mixed-Effects Models in S and S-Plus”. Springer, ISBN 0-387-98957-0.
$$$: Faraway's "Extending the Linear Model with R"
$$$: Demidenko, "Mixed Models" more general, theoretical treatment.

Good conversation snippets from the threads

There is now an anova() method for lmer() and lmer2() fits performed using method="ML". You can compare different models and get p-values for p-value obsessed journals using this approach. [3]
This is Bate's answer to people who already know the answer:

With lmer fits I recommend checking a Markov chain Monte Carlo sample from the posterior distribtuion of the parameters to determine which are signification (although this is not terribly well documented at the present time). [4]

Similarly

Try using mcmcsamp() to sample from the posterior distribution of the parameter estimates. You can calculate a p-value from that, if that is  your desire. [5]

More for people who already know the answer: [6]
Here is what I've been doing (snippet below. limit: only really works with infinite data): """My general advice to those who are required to produce a p-value for a particular fixed-effects term in a mixed-effects model is to use a likelihood ratio test. Fit the model including that term using maximum likelihood (i.e. REML = FALSE), fit it again without the term and compare the results using anova. The likelihood ratio statistic will be compared to a chi-squared distribution to get a p-value and this process is somewhat suspect when the degrees of freedom would be small. However, so many other things could be going wrong when you are fitting complex models to few observations that this may be the least of your worries."""[7]

Code snippets

Keep in mind that I'm just a dilettante. I've used these, but that doesn't imply that they are appropriate, or correct:

### for two lmer fits lmerout.basic and lmerout.null
### only use this on models that differ by one fixed effect.  the smaller (or closer to null) model should 
###  be the second one.  If your goal is to find significant variables in the system, your procedure is 
###  to start at null model or one with only the controlled variables and incrementally add in 
###  dependent/invented explanatory variables .
test.lm<- function(lmerout1, lmerout2) { pchisq(as.numeric(2*(logLik(lmerout1)-logLik(lmerout2)), lower=FALSE))}
test.lm(lmout.basic, lmout.null)

### alternatively (and probably preferable) (and, still, preferable with only one variable difference at a time)
anova(lmerout.basic, lmerout.null)

Appendix if the starting point of this discussion is too advanced

All the way back: R is a programming language for statistical analysis. It is free in both the bird and lunch senses. It is almost identical to a very expensive program called S. It is also very good, and in some ways it may have eclipsed S.
Less far back: A t-test is a very simple test, taken on sample from two big lists of numbers, of whether those two lists are not really different. Comparing samples gets way more complicated. Analysis of variance (ANOVA) is a step on the way, and is the most popular manifestation of The Linear Model. Besides coursework, the best way that I know to get from knowing nothing to understanding this post is at sportsci.org . It has the most thorough free resource that I've found for explaining it all to dummies [8]. I read the whole thing. (Note: this was in 2010, whatever that implies).
In R, aov() works most of the time. If not, lm() should work. If not, things get tangled and the learning curve gets worse. There is glm(), lme(), nlme(), and lmer() (of which glmer(), nlmer() and lmer2() are variants), all used with anova() or glht() or MCMC/bootstrapping techniques involving pvals.func(), merMCMC-class(), mcmcsamp() or your own code. Relevant libraries (or packages, whatever) are nlme, lme4, languageR, RLRsim, and multcomp. This page refers mostly to the use of lmer() in the package lme4 linked in the documentation section.
Hypothesis testing gets hard after lm(). You can't just type
```
summary(yourlm)
```
like with lm(). You have to either sample simulated data from the statistical model you fit, and infer test statistics from the simulation (bootstrapping) or calculate the log likelihood ratio of your model against a null model.* You do this by writing your own test, using the one above, or just typing
```
anova(yourlm, yourotherlm)
```
. Any of these could not-work, depending on the complexity of the model.

* Ideally one that is the same-but-for-one-factor (Opaquely, the distribution of the log of the square of the ratio of two models is approximately chi-square, and I think it errs conservative. Sameness-but-for-one makes df=1 which is the easiest to interpret in this context).

@@ Line 1: / Line 1: @@
-I've been inferring my knowledge of statistical analysis from the more and less patient attacks that Douglas Bates, the author of R's mixed effects package, unleashes on non-statisticians who need p-values in the language R.  It turns out that, for complex enough designs, experimentalists are at the cutting edge of statistics, and a lot isn't know.  Specifically, this may be relevant to you if you use random effects (e.g. continuous-ish covariates), or nested experimental designs (e.g. involving phrases like "within subject" and "repeated measure"), a mix of ordinal/categorical and continuous variables and other non-vanilla flavors of analysis of variance.
+'''NOTE:''' ''I wrote this in 2010 and touched it up recently in 2012.  I can use bootstrapping now, but I still prefer to use the very simple <pre>anova(mymodel, mymodelminusoneparameter)</pre>.  There is one especially exciting approach, completely different, still very technical, that I'm not at all facile with, called [[Weblog:Bayesian_data_analysis_in_English|Bayesian analysis]]. ''
-It gets religious because the professionals know how meaningless the practitioner's favorite number is (the p-value) and the practitioners are bolstered by the fact that SAS gives just those numbers (apparently SAS mixed-effect implementations give p-values).  In the most well known email, Bates show obvious emotional restraint explaining why it is not sensible to expect p-values generally from mixed effect models.  He also offers model constraints under which a p-value can be a meaningful statistic.
+I've been inferring my knowledge of statistical analysis from the more and less patient attacks that Douglas Bates unleashes on non-statisticians who need p-values in the language R.  Douglas Bates is the author of R's mixed effects package, which lets you do tricky things. It turns out that, for complex enough designs, experimentalists are at the cutting edge of statistics, and a lot isn't known.  Specifically, this article may be relevant to you if you use random effects (e.g. continuous-ish covariates), or nested experimental designs (e.g. involving phrases like "within subject" and "repeated measure"), a mix of ordinal/categorical and continuous variables and other non-vanilla flavors of analysis of variance.
-But there still isn't enough information available for people who aren't already experts.  (The "documentation" section, below, gives all the information you need if you do already know enough to know the right way to do it).  I'm trying to collect the useful things I've learned into one spot, mostly for my own sake.
+It gets religious because the professionals are only too aware of the meaninglessness of p-value's, which most practitioners function to swear by.  The practitioners want R's mixed-effects to give p-values, and they are bolstered by the fact that SAS gives just those numbers (apparently SAS mixed-effect implementations give p-values).  In the most well-known email, Bates show obvious emotional restraint explaining why it is not sensible to expect p-values generally from mixed effect models.  He also offers model constraints under which a p-value can be a meaningful statistic.
-==good conversations==
+But there still isn't enough information available for people who aren't already experts.  (The "Documentation" section, below, gives all the information you need if you do already know enough to know the right way to do it).  I'm trying to collect the useful things I've learned into one spot, mostly for my own sake.
-*[http://yihui.name/en/2010/04/rules-of-thumb-to-meet-r-gurus-in-the-help-list/  The quickest way to summon a God is to anger It (short)]
-*[http://rwiki.sciviews.org/doku.php?id=guides:lmer-tests the most referenced (and now reified) discussion of significance testing in mixed models][https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html]
-*[http://permalink.gmane.org/gmane.comp.lang.r.lme4.devel/3334 more on R^2 in mixed models]
-*unf. I couldn't find any of the empassioned email pleas for "good enough" p-values in R's mixed models.  They have an important enough role in the mailing list ecology, and they make a good enough point: "R should be useful to non-experts.  R is open source and can be made by anyone to serve the masses.  'Someone' should add a flag that makes R give SAS-like output (p-values),"  But I'm picking up on something.  It seems that everyone who knows enough about mixed effect models to make the necessary changes in R's code, has too much integrity to do so.  Warning sign.
-==documentation==
+==Good conversations on the threads==
-most of these references are only everything-you-need if you already know what you are doing.  Otherwise you will have to supplement them with Wikipedia and the other things I've found.  They are still worth reading. Osmosis is a general enough phenomenon that it works on even really inscrutable subjects, even when you have no idea what is going on.  You just have to stay awake.  My problem is staying awake.
+*[http://yihui.name/en/2010/04/rules-of-thumb-to-meet-r-gurus-in-the-help-list/  The quickest way to summon a God is to anger it (short)]
-*to google anything about stuff in R, "r stuff" won't work. Try "r cran stuff" or "r help stuff"
+*[http://rwiki.sciviews.org/doku.php?id=guides:lmer-tests The most referenced (and now reified) discussion of significance testing in mixed models][https://stat.ethz.ch/pipermail/r-help/2006-May/094765.html]
-*[http://zoonek2.free.fr/UNIX/48_R/14.html useful walk through the usefulness of mixed models, examples in R]. also [http://zoonek2.free.fr/UNIX/48_R/]
+*[http://permalink.gmane.org/gmane.comp.lang.r.lme4.devel/3334 More on R^2 in mixed models]
-*[http://worldwideweb.unconventionallylonguniformresourcelocator.com/forever/wikirefs/R_lmer_Dummies_Supplement.pdf a few slides on lmer], a simple thing I made.  More to document my confusion, but maybe helpful with lmer syntax
+*unf. I couldn't find any of the great empassioned email pleas for p-values in R's mixed models that are "good enough."  They have an important enough role in the mailing list ecology, and they make a good enough point: "R should be useful to non-experts.  R is open source and can be made by anyone to serve the masses.  'Someone' should add a flag that makes R give SAS-like output (p-values),"  But I'm picking up on something.  It seems that everyone who knows enough about mixed effect models to implement the necessary changes has too much integrity to do so.  After all, this discussion has been going on for years.  You might take that as a warning sign.
+==Documentation==
+Most of these references are only everything-you-need if you already know what you are doing (two years after writing this I still don't follow everything).  Otherwise you will have to supplement them with Wikipedia and the other things I've found.  They are still worth reading. Osmosis is a general enough phenomenon that it works on even really inscrutable subjects, even when you have no idea what is going on.  You just have to stay awake.  My problem is staying awake.
+*To find anything about "foo" in R, Googling "r foo" won't work, because "r" is just a letter of the alphabet. Try "r foo help" or "r foo cran."
+*[http://zoonek2.free.fr/UNIX/48_R/14.html Useful walk through why mixed models are handy, with examples in R]; also [http://zoonek2.free.fr/UNIX/48_R/].
+*[http://worldwideweb.unconventionallylonguniformresourcelocator.com/forever/wikirefs/R_lmer_Dummies_Supplement.pdf A few slides on lmer], a simple thing I made.  More to document my confusion, but might help you with lmer syntax.
 *http://cran.r-project.org/web/packages/lme4/vignettes/Implementation.pdf
 *http://cran.r-project.org/web/packages/lme4/vignettes/
 *http://cran.r-project.org/web/packages/lme4/
-*[http://www.r-project.org/doc/Rnews/bib/Rnewsbib.html#Rnews:Bates:2004 An article by Bates in rnews]
+*[http://www.r-project.org/doc/Rnews/bib/Rnewsbib.html#Rnews:Bates:2004 An article by Bates in Rnews]
 *[http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf The Exegesis: A Talmudic document that Bates refers a lot of ppl to]
 *$$$: Jose C. Pinheiro and Douglas M. Bates (2000), “Mixed-Effects Models in S and S-Plus”. Springer, ISBN 0-387-98957-0.
 *[http://www.maths.bath.ac.uk/~jjf23/ELM/ $$$: Faraway's "Extending the Linear Model with R"]
-*[http://www.dartmouth.edu/~eugened/ $$$: Demidenko, "Mixed Models" more general, theoretical treatment]
+*[http://www.dartmouth.edu/~eugened/ $$$: Demidenko, "Mixed Models" more general, theoretical treatment.]
-==conversation snippets==
+==Good conversation snippets from the threads==
-*"""There is now an anova() method for lmer() and lmer2() fits performed using method="ML".  You can compare different models and get p-values for p-value obsessed journals using this approach.""" [https://stat.ethz.ch/pipermail/r-help/2007-August/138747.html]
+* There is now an anova() method for lmer() and lmer2() fits performed using method="ML".  You can compare different models and get p-values for p-value obsessed journals using this approach. [https://stat.ethz.ch/pipermail/r-help/2007-August/138747.html]
-*This is Bate's answer to people who already know the answer: """With lmer fits I recommend checking a Markov chain Monte Carlo sample from the posterior distribtuion of the parameters to determine which are signification (although this is not terribly well documented at the present time).""" [http://tolstoy.newcastle.edu.au/R/e2/help/06/10/3565.html]
+*This is Bate's answer to people who already know the answer:
-*similarly """Try using mcmcsamp() to sample from the posterior distribution of the parameter estimates. You can calculate a p-value from that, if that is  your desire."""[http://www.opensubscriber.com/message/r-help@stat.math.ethz.ch/5544126.html]
+ With lmer fits I recommend checking a Markov chain Monte Carlo sample from the posterior distribtuion of the parameters to determine which are signification (although this is not terribly well documented at the present time). [http://tolstoy.newcastle.edu.au/R/e2/help/06/10/3565.html]
-*More for people who already know the answer [http://permalink.gmane.org/gmane.comp.lang.r.lme4.devel/736?set_blog_all=yes]
+*Similarly
-*What I've been doing (snippet below. limit: only really works with infinite data): """My general advice to those who are required to produce a p-value for a particular fixed-effects term in a mixed-effects model is to use a likelihood ratio test.  Fit the model including that term using maximum likelihood (i.e. REML = FALSE), fit it again without the term and compare the results using anova. The likelihood ratio statistic will be compared to a chi-squared distribution to get a p-value and this process is somewhat suspect when the degrees of freedom would be small. However, so many other things could be going wrong when you are fitting complex models to few observations that this may be the least of your worries."""[http://markmail.org/message/56c4ck4mmjyouqfo]
+ Try using mcmcsamp() to sample from the posterior distribution of the parameter estimates. You can calculate a p-value from that, if that is  your desire. [http://www.opensubscriber.com/message/r-help@stat.math.ethz.ch/5544126.html]
+*More for people who already know the answer: [http://permalink.gmane.org/gmane.comp.lang.r.lme4.devel/736?set_blog_all=yes]
+*Here is what I've been doing (snippet below. limit: only really works with infinite data): """My general advice to those who are required to produce a p-value for a particular fixed-effects term in a mixed-effects model is to use a likelihood ratio test.  Fit the model including that term using maximum likelihood (i.e. REML = FALSE), fit it again without the term and compare the results using anova. The likelihood ratio statistic will be compared to a chi-squared distribution to get a p-value and this process is somewhat suspect when the degrees of freedom would be small. However, so many other things could be going wrong when you are fitting complex models to few observations that this may be the least of your worries."""[http://markmail.org/message/56c4ck4mmjyouqfo]
-==code snippets==
+==Code snippets==
-*keep in mind that I'm just a dilettante.  I've used these, but that doesn't imply that they are appropriate, or correct
+*Keep in mind that I'm just a dilettante.  I've used these, but that doesn't imply that they are appropriate, or correct:
 <pre>
@@ Line 48: / Line 52: @@
 </pre>
-==appendix if the starting point of this discussion is too advanced==
+==Appendix if the starting point of this discussion is too advanced==
-*All the way back:  R is a programming language for statistical analysis.  It is free in both the bird and pocketbook senses.  It is almost identical to a very expensive program called S.  It is also very good, and in some ways it may already have eclipsed  S.
+*''All the way back:''  R is a programming language for statistical analysis.  It is free in both the bird and lunch senses.  It is almost identical to a very expensive program called S.  It is also very good, and in some ways it may have eclipsed S.
-*Less far back:  A t-test is a very simple test of whether two lists of numbers are not ''really'' different.  Comparing samples gets way more complicated.  Analysis of variance (ANOVA) is a step on the way, and is the most popular manifestation of The Linear Model.  Besides coursework, the best way that I know to get from knowing nothing to understanding this post is at sportsci.org .  It has the most thorough free resource that I've found for explaining it all to dummies [http://www.sportsci.org/resource/stats/].  I read the whole thing.
+*''Less far back:''  A t-test is a very simple test, taken on sample from two big lists of numbers, of whether those two lists are not ''really'' different.  Comparing samples gets way more complicated.  Analysis of variance (ANOVA) is a step on the way, and is the most popular manifestation of The Linear Model.  Besides coursework, the best way that I know to get from knowing nothing to understanding this post is at sportsci.org .  It has the most thorough free resource that I've found for explaining it all to dummies [http://www.sportsci.org/resource/stats/].  I read the whole thing. (Note: this was in 2010, whatever that implies).
-*in R, aov() works most of the time.  If not, lm() should work.  If not, things get tangled and the learning curve gets worse.  There is glm(), lme(), nlme(), and lmer() (of which glmer(), nlmer() and lmer2() are variants), all used with anova() or glht() or MCMC/bootstrapping techniques involving pvals.func(), merMCMC-class(), mcmcsamp() or your own code.  Relevant libraries are nlme, lme4, languageR, RLRsim, and multcomp.  This page refers mostly to the use of lmer() in the package lme4 linked in the documentation section.
+*In R, aov() works most of the time.  If not, lm() should work.  If not, things get tangled and the learning curve gets worse.  There is glm(), lme(), nlme(), and lmer() (of which glmer(), nlmer() and lmer2() are variants), all used with anova() or glht() or MCMC/bootstrapping techniques involving pvals.func(), merMCMC-class(), mcmcsamp() or your own code.  Relevant libraries (or packages, whatever) are nlme, lme4, languageR, RLRsim, and multcomp.  This page refers mostly to the use of lmer() in the package lme4 linked in the documentation section.
-*Hypothesis testing gets hard after lm().  You can't just type <pre>summary(yourlm)</pre> like with lm().  You have to either sample the model and infer test statistics from the simulated results (bootstrapping) or calculate the log likelihood ratio of the model against a null model*.  You do this by writing your own test, using the one above, or just typing <pre>anova(yourlm, yourotherlm)</pre>. Any of these could not-work, depending on the complexity of the model.
+*Hypothesis testing gets hard after lm().  You can't just type <pre>summary(yourlm)</pre> like with lm().  You have to either sample simulated data from the statistical model you fit, and infer test statistics from the simulation (bootstrapping) or calculate the log likelihood ratio of your model against a null model.*  You do this by writing your own test, using the one above, or just typing <pre>anova(yourlm, yourotherlm)</pre>. Any of these could not-work, depending on the complexity of the model.
-<nowiki>*</nowiki> ideally one that is the same-but-for-one-factor (opaquely, the distribution of the log of the square of the ratio of two models is approximately chi-square, and I think it errs conservative.  sameness-but-for-one makes df=1 which is the easiest to interpret in this context)
+<nowiki>*</nowiki> Ideally one that is the same-but-for-one-factor (Opaquely, the distribution of the log of the square of the ratio of two models is approximately chi-square, and I think it errs conservative.  Sameness-but-for-one makes df=1 which is the easiest to interpret in this context).