Aspiring scientists need to know that a science career is not an exception to the universal requirement for routine drudgery that applies to all real jobs:
Back in my freshman year of college, I was planning to be a biochemist. I spent hours and hours of time in the lab: mixing chemicals in test tubes, putting samples in different machines, and analyzing results. Over time, I grew frustrated because I found myself spending weeks in the lab doing manual work and just a few minutes planning experiments or analyzing results. After a year, I gave up on chemistry and became a computer scientist, thinking that I would spend less time on preparation and testing and more time on analysis.
Unfortunately for me, I chose to do data mining work professionally. Everyone loves building models, drawing charts, and playing with cool algorithms. Unfortunately, most of the time you spend on data analysis projects is spent on preparing data for analysis. I’d estimate that 80% of the effort on a typical project is spent on finding, cleaning, and preparing data for analysis. Less than 5% of the effort is devoted to analysis. (The rest of the time is spent on writing up what you did)…
In practice, data is almost never stored in the right form for analysis. Even when data is in the right form, there are often surprises in the data. It takes a lot of work to pull together a usable data set.
– Joseph Adler, R in a Nutshell (2010)
And this doesn’t include time on administrative and fund-raising tasks. And a word to the wise: it is definitely worth it to take the time and really learn how to use the tools to process data. Otherwise you’re doing the data equivalent of attacking a tree with a pocket knife instead of a chainsaw.