Really reproducible research

I think a lot about the related notions of reproducibility and replicability in research. I intend for these to be major reoccurring themes. Today I saw a big special issue of Perspectives on Psychological Science on the topic of reproducibility, so I thought it would be good to break up the Bayesian stuff with a short rant about reproducibility.

First, I’d like to propose a distinction between replicability and reproducibility. Replicability is about results and reproducibility is about methods. These are not independent, though, more like two sides of the same coin. If your results are replicable, that means that someone else can basically find the same pattern. This is a basic, fundamental goal of science. If someone finds that standing on your head while practicing a foreign language leads to better retention of new vocabulary, great. But if no one else can produce that result, then it calls the initial result into question.

This is a pretty big issue these days. The special issue I mention above is just one of the many signs. There’s been a lot of recent discussion about how the publishing system in the social sciences (psychology has been getting the most attention, but I think this is true in virtually every field I’m acquainted with) actually discourages replication. Sensational results end up getting into more “important” journals, and from a purely statistical point of view, sensational results really ought to be rare things. The result is over-publication of results that may not hold up well when people try to replicate them. And then there’s the problem that there’s no viable venue to published “failed” studies, which do not show any effects. I’ll make separate posts for each of these topics at some point. The point here is that those of us who like to call ourselves “scientists” should hold ourselves to a better standard of replicability that the current publishing system encourages. Otherwise, we are just flooding the market with results that no one can really trust, and how can one formulate effective or accurate theories when trying to explain false results?

But how do you get replication? That’s where reproducible methods come in. What if I run an experiment, get some interesting results, and then someone from a competing theoretical point of view says they tried to replicate, but can’t, therefore casting doubt on my results? Having truly reproducible methods is my safeguard. This means that my methods are transparent enough that other people can perform the experiment in a similar enough way, or that if someone tries to replicate and can’t, they can compare their methods to mine, to see what might be different. There are some current standards that people try to keep in how they report their methods, to facilitate this, but I think we’re still really a long way from true reproducibility. The key morpheme here is the -able part of reproducible. That is, if it’s extremely difficult to reproduce methods, that is less reproducible. It can often be difficult to reproduce even your own methods, because there are so many small decision points along the way, and there is often considerable work at each step.

The good news is that there has been a recent rise in tools to make reproducible research easier, and to improve on the -able part, especially in reproducing statistical analysis. But I believe that we need to extend beyond reproducible stats. Sharing your data and your R code that you used for your analysis is a great first step, but that’s all it is: a first step.

So ultimately, improving reproducible methods is one of the most important parts of addressing the replication problem. Even if we “fix” the publishing system so that failed replications are more visible, without better means to understand (1) how to replicate someone’s work and (2) what potential differences between studies could be explaining replication failures, we’ll still be in the dark when it comes to actually evaluating reproducibility.

If this is an interesting issue to you, stay tuned to this blog, as I plan to work through some of the concepts involved, as well as some of the practical means to actually achieve better reproducibility.

Shoestring Psycholing

A language science blog

Comments