Sean Eddy explains why sequencing is replacing many older assays, and why biologists need to learn to analyze their own data.
“High throughput sequencing for neuroscience”:
If we were talking about a well-defined resource like a genome sequence, where the problem is an engineering problem, I’m fine with outsourcing or building skilled teams of bioinformaticians. But if you’re a biologist pursuing a hypothesis-driven biological problem, and you’re using using a sequencing-based assay to ask part of your question, generically expecting a bioinformatician in your sequencing core to analyze your data is like handing all your gels over to some guy in the basement who uses a ruler and a lightbox really well.
Data analysis is not generic. To analyze data from a biological assay, you have to understand the question you’re asking, you have to understand the assay itself, and you have to have enough intuition to anticipate problems, recognize interesting anomalies, and design appropriate controls. If we were talking about gels, this would be obvious. You don’t analyze Northerns the same way you analyze Westerns, and you wouldn’t hand both your Westerns and your Northerns over to the generic gel-analyzing person with her ruler in the basement. But somehow this is what many people seem to want to do with bioinformaticians and sequence data.
It is true that sequencing generates a lot of data, and it is currently true that the skills needed to do sequencing data analysis are specialized and in short supply. What I want to tell you, though, is that those data analysis skills are easily acquired by biologists, that they must be acquired by biologists, and that that they will be. We need to rethink how we’re doing bioinformatics.
I would add this: it takes some time to learn, but in the end it’s not that hard, people. Students in chemistry and physics routinely learn the requisite skills. We need to educate biologists who expect to do programming, math, and statistics.
My sense is a lot of the people worried about this are like me; postdocs not trained extensively in stats/programming/etc. and getting caught up in the sudden necessary wave of big data (answer a hypothesis drive question sure, but it seems like to get funding you have to incorporate one of these high throughput methods). I’m trying to learn on the fly, but it’s by no means something I feel expert in. I seek bioinformaticians to ask questions of sometimes, but do try not to bother them. The other thing I would say is that culturally, it seems that learning curves are unacceptable. Time spent learning is not time spent producing data. And that attitude needs to change. Back to my Code academy Python lesson…
It definitely needs to be integrated with the training. We teach new students how to properly pipette and run gels; you’re not expected to learn that on your own, without supervision.
The same should hold for computational skills, but the problem is most older PIs don’t have them. So bugging the local bioinformaticists should be an acceptable thing to do!
Agreed! Luckily they’re computer people and have created some really great tutorials to get individuals started with it all.