In my latest Pacific Standard column, I take a look at the recent hand-wringing over the reproducibility of published science. A lot of people are worried that poorly done, non-reproducible science is ending up in the peer-reviewed literature.
Many of these worries are misguided. Yes, as researchers, editors, and reviewers we should do a better job of filtering out bad statistical practices and poor experimental designs; we should also make sure that data, methods, and code are thoroughly described and freely shared. To the extent that sloppy science is causing a pervasive reproducibility problem, then we absolutely need to fix it.
But I’m worried that the recent reproducibility initiatives are going beyond merely sloppy science, and instead are imposing a standard on research that is not particularly useful and completely ahistorical. When you see a hot new result published in Nature, should you expect other experts in the field to be able reproduce it exactly? Continue reading “Why reproducibility initiatives are misguided”
We all know how gravity is supposed to work. Without air resistance, a feather and a bowling ball (the standardized materials for all gravitational tests) should accelerate toward the center of the Earth at the same rate, thus striking the ground at the same time. Humans have tested this. It works.
Although we know this thing, it is so far removed from our daily experience that it is still stunning to watch it happen. This fundamental principle is nicely illustrated in this video from the BBC. The video also nicely shows how amazed a roomful of individuals who know how the experiment will work can be when the experiment works exactly as expected.
That is why we need the scientific method to rigorously test hypotheses and incrementally build our knowledge of how the universe works. Our day-to-day experience of and intuition about the world is extremely valuable, but also extremely deceptive.
For the record, the tortoise vs hare in a vacuum race I alluded to in the title would be incredibly inhumane and disappointing, in addition to having no winner – unless, UNLESS we had the tortoise and hare race in spacesuits. Why aren’t we racing animals in spacesuits?
HT: Jared Heidinger
On the 537th episode of the WTF with Marc Maron Podcast, Marc Maron has an interesting conversation with Rivers Cuomo* of Weezer about his method for songwriting, particularly in the gap between Pinkerton (1996) and The Green Album (2001).
What I find so captivating is Cuomo’s application of a scientific mindset to “solving” his creative process in the hopes of working more efficiently and effectively. He fails, but does not conclude that his art cannot be understood by science. His problem was a classic scientific problem of too many variables, too small of a sample size (ie, n=1), and too little time. Cuomo also defies Maron’s efforts to portray his analytical quest as potentially maddening. It simply wasn’t productive enough.
I’m going to recommend the whole interview, but the segment I have described starts at about the 34:50 mark.
*Promoting Weezer’s new album Everything Will Be Alright in the End.
The results of a small survey of graduate students and post-docs suggest that our research trainees don’t really know what research misconduct is below the level of flat-out fabrication.
However, we were dismayed that only 54 per cent gave a three to “knowingly selecting only those data that support a hypothesis” and 42 per cent to “deleting some data to make trends clearer”. The naivety is staggering. – Tim Birkhead & Tom Montgomerie
They also note that these individuals face considerable barriers to reporting misconduct when they believe it has occurred.
I recall the mandatory ethics class we took at Washington University in St. Louis. It was worthless. I recall spending a great deal of time talking about “salami science”. Salami science is the practice of parceling your work out into as many paper with as little unique content each as possible. This is bad behavior that games some of the systems used to evaluate researchers. It does not, however, corrupt the scientific results with inaccurate data and results.
While I received my training in proper, scientific conduct in my thesis lab, that is not a sustainable solution. The future of scientific investigation should not depend on the efforts of individual thesis mentors – they are simply too inconsistent. Ethics education is key to training in the proper implementation of the scientific method and should be central to all aspects of graduate training, including the development of quality courses that provide real training in ethics and identifying misconduct.
This is too good not to share, from a preprint by Andrew Gelman and Eric Loken, “The garden of forking paths: Why multiple comparisons can be a problem, even when there is no ‘fishing expedition’ or ‘p-hacking’ and the research hypothesis was posited ahead of time”
Without modern statistics, we find it unlikely that people would take seriously a claim about the general population of women, based on two survey questions asked to 100 volunteers on the internet and 24 college students. But with the p-value, a result can be declared significant and deemed worth publishing in a leading journal in psychology.
The paper is here (PDF).