Sean Eddy explains why sequencing is replacing many older assays, and why biologists need to learn to analyze their own data.
If we were talking about a well-defined resource like a genome sequence, where the problem is an engineering problem, I’m fine with outsourcing or building skilled teams of bioinformaticians. But if you’re a biologist pursuing a hypothesis-driven biological problem, and you’re using using a sequencing-based assay to ask part of your question, generically expecting a bioinformatician in your sequencing core to analyze your data is like handing all your gels over to some guy in the basement who uses a ruler and a lightbox really well.
Data analysis is not generic. To analyze data from a biological assay, you have to understand the question you’re asking, you have to understand the assay itself, and you have to have enough intuition to anticipate problems, recognize interesting anomalies, and design appropriate controls. If we were talking about gels, this would be obvious. You don’t analyze Northerns the same way you analyze Westerns, and you wouldn’t hand both your Westerns and your Northerns over to the generic gel-analyzing person with her ruler in the basement. But somehow this is what many people seem to want to do with bioinformaticians and sequence data.
It is true that sequencing generates a lot of data, and it is currently true that the skills needed to do sequencing data analysis are specialized and in short supply. What I want to tell you, though, is that those data analysis skills are easily acquired by biologists, that they must be acquired by biologists, and that that they will be. We need to rethink how we’re doing bioinformatics.
I would add this: it takes some time to learn, but in the end it’s not that hard, people. Students in chemistry and physics routinely learn the requisite skills. We need to educate biologists who expect to do programming, math, and statistics.