Statistical Power! It sounds like something a math textbook superhero would exclaim while collecting data points. I’ll be honest, even though I have a PhD, my stats background is very weak. My college major required all sorts of delightful calculus and differential equations but I’ve never taken a statistics course. My graduate work required only the most basic of statistical analysis (which lucky for me, our software could handle without my input). It turns out that I am not alone, and this is a major problem.
In Nature Reviews Neuroscience there was a really interesting study on the low statistical power of many types of neuroscience research. The definition of statistical power is “the probability that a test will correctly reject the null hypothesis when the null hypothesis is false”. More simply, low statistical power means that the likelihood of discovering effects that are actually true is low. The chance of finding a “false positive” is high when you have low statistical power. You can get low statistical power by having a small sample size, a small effect that you are observing or by a combination of both. Even if you perform your analysis perfectly otherwise, the low power will result in an exaggerated estimate of the magnitude of the phenotype you are observing. This could result in the “winner’s curse” where a lucky scientist who publishes a study observing a phenotype could be doomed to have no other lab be able to repeat that work.
The study uses meta-analysis of neuroscience research published in 2011. I will just touch on the highlights of what they discovered. The median statistical power in the neuroscience studies evaluated was 21%. Many studies fell below 20% but there were a small number of studies that were above 90%. The small number of high-powered studies were meta-analyses that combined data from many studies and effectively increased their sample number. One of the pieces of information that hit closest to home (being a model organism devotee) was their evaluation of animal model studies. They chose a specific type of experiment looking at mice navigating a water maze or a radial maze. The median statistical power found for these types of experiments was 18% and 31% respectively with an average sample size of 22 animals for the water maze and 24 for the radial maze. In order to achieve 80% statistical power in those experiments (which is great for a single study) you would need to test 134 animals in the water maze and 68 animals in the radial maze.
This review article has focused on neuroscience studies, but it’s obvious that this type of problem is prevalent throughout all research. By trying to use as few animals as possible to perform our experiments, we are handcuffed in finding true reproducible effects. Often, data from many labs can be combined in a meta-analysis to increase the statistical power and confirm findings, but more often this is not the case. Are low powered animal studies responsible for the large number of failures when treatments are moved to clinical trials? How can we plan our experiments to improve our statistical power of observation and make our data available to help other scientists to plan their studies appropriately? As a drosophila researcher, I generally have no problem getting a very large number of individuals for analysis but I was stricken by the implications of this review. There is a fine balance between minimizing the use of animals and performing experiments that are reliable and repeatable.