The unexpected importance of publishing unreplicable research

There was a recent attempt to replicate 100 results out of psychology. It succeeded in replicating less than half. Is Psychology in crisis? No. Why would I say that? Because unreplicable research is only half of the problem, and we’re ignoring the other half. As with most pass/fail decisions by humans, a decision to publish after peer review can go wrong in two ways:

  1. Accepting work that “shouldn’t” be published (perhaps because it will turn out to have been unreplicable; a “false positive” or “Type I” error)
  2. Rejecting work that, for whatever reason, “should” be published (a “false negative” or “Type II” error).

It is impossible to completely eliminate both types of error, and I’d even conjecture that it’s impossible for any credible peer review system to completely eliminate either type of error: even the most cursory of quality peer review will occasionally reject good work, and even the most conservative of quality peer review will accept crap. It is na├»ve to think that error can ever be eliminated from peer review. All you can do is change the ratio of false positives to false negatives, are your own relative preference for the competing values of skepticism and credulity.

So now you’ve got a choice, one that every discipline makes in a different way: you can build a conservative scientific culture that changes slowly, especially w.r.t. its sacred cows, or you can foster a faster and looser discipline with lots of exciting, tenuous, untrustworthy results getting thrown about all the time. Each discipline’s decision ends up nestling within a whole system of norms that develop for accommodating the corresponding glut of awful published work in the one case and excellent anathematic work in the other. It is hard to make general statements about whole disciplines, but peer review in economics tends to be more conservative than in psychology. So young economists, who are unlikely to have gotten anything through the scrutiny of their peer review processes, can get hired on the strength of totally unpublished working papers (which is crazy). And young psychologists, who quickly learn that they can’t always trust what they read, find themselves running many pilot experiments for every few they publish (which is also crazy). Different disciplines have different ways of doing science that are determined, in part, by their tolerances for Type I relative to Type II error.

In short, the importance of publishing unreplicable research is that it helps keep all replicable research publishable, no matter how controversial. So if you’re prepared to make a judgement call and claim that one place on the error spectrum is better than another, that really says more about your own high or low tolerance for ambiguity, or about the discipline that trained you, than it does about Science And What Is Good For It. And if you like this analysis, thank psychology, because the concepts of false positives and negatives come out of signal detection theory, an important math-psych formalism that was developed in early human factors research.

Because a lot of attention has gone toward the “false positive” problem of unreplicable research, I’ll close with a refresher on what the other kind of problem looks like in practice. Here is a dig at the theory of plate tectonics, which struggled for over half a century before it finally gained a general, begrudging acceptance:

It is not scientific but takes the familiar course of an initial idea, a selective search through the literature for corroborative evidence, ignoring most of the facts that are opposed to the idea, and ending in a state of auto-intoxication in which the subjective idea comes to be considered an objective fact.*

Take that, plate tectonics.