Finding function in the genome with a null hypothesis

Last September, there was a wee bit of a media frenzy over the Phase 2 ENCODE publications. The big story was supposed to be that ‘junk DNA is debunked’ – ENCODE had allegedly shown that instead of being filled with genetic garbage, our genomes are stuffed to the rafters with functional DNA. In the backlash against this storyline, many of us pointed out that the problem with this claim is that it conflates biochemical and organismal definitions of function: ENCODE measured biochemical activities across the human genome, but those biochemical activities are not by themselves strong proof that any particular piece of DNA actually does something useful for us.

The claim that ENCODE results disprove junk DNA is wrong because, as I argued back in the fall, something crucial is missing: a null hypothesis. Without a null hypothesis, how do you know whether to be surprised that ENCODE found biochemical activities over most of the genome? What do you really expect non-functional DNA to look like?

In our paper in this week’s PNAS, we take a stab at answering this question with one of the largest sets of randomly generated DNA sequences ever included in an experimental test of function. Continue reading “Finding function in the genome with a null hypothesis”

ENCODE is devouring the rest of biomedical science

A new NIH RFA:

PsychENCODE: Identification and Characterization of Non-coding Functional Elements in the Brain, and their Role in the Development of Mental Disorders (R01)

The Encyclopedia of DNA Elements (ENCODE) project, by systematically cataloging transcribed regions, transcription factor binding sites, and chromatin structure, has recently found that a larger fraction of the human genome may be functional than was previously appreciated. However, our understanding of the role of these functional genomic elements in neurodevelopment and mental disorders is at an early stage. This funding opportunity will support studies that identify non-coding functional genomic elements and elucidate their role in the etiology of mental disorders.

Suddenly, the ENOCDE model is now the way to do science. It’s hard to disagree with Dan Graur on what the consequences are: Continue reading “ENCODE is devouring the rest of biomedical science”

Doolittle disagrees, politely

The rebuttal to the ENCODE project’s claim to have vanquished junk DNA by Graur et al. got a lot of attention for its scathing rhetoric. If you already have enough troubles in your life, W Ford Doolittle penned a cogent, but polite rebuttal of the claim in PNAS.

…what would we expect for the number of functional elements (as ENCODE defines them) in genomes much larger than our own genome? If the number were to stay more or less constant, it would seem sensible to consider the rest of the DNA of larger genomes to be junk or, at least, assign it a different sort of role (structural rather than informational)…A larger theoretical framework, embracing informational and structural roles for DNA, neutral as well as adaptive causes of complexity, and selection as a multilevel phenomenon, is needed.

Unfortunately, you need a subscription to read the full length article, which I do not. Therefore, I’m not endorsing all of Doolittle’s arguments, but I do like that he seems to agree with my assertion from “Decoding ENCODE” that evolutionary theory expects junk DNA in species with the population and genomic characteristics of humans.

*Hat tip to Leonid Kruglyak.

Decoding ENCODE

On Sunday, I participated in a panel discussion of the ENCODE project and issues  related to it, with the folks from ScienceSunday via Google+ Hangouts. Ian Bosdet and I joined hosts Rajini Rao, Buddhini Samarasinghe, and Scott Lewis. to talk about ENCODE and make it accessible to those without a decade of post-graduate training in genomics If you have a spare 78 minutes, the discussion can be viewed on YouTube.

Retraction rate increases with impact factor – is this because of professional editors?

Folks have long noted the strong positive correlation between high impact factor and retraction rate. There are three primary theories I’ve run across that attempt explain why Nature, Science, Cell, etc. have substantially higher retraction rates than other journals:

1) Acceptable risk/fame and glory theory: High impact factor journals are willing to publish riskier, but potentially higher-impact claims ASAP – more retractions are the price for getting high-impact science out early. The more negative version of this theory is that high impact factor journals care more about a high impact factor than about the integrity of what they publish.

2) Heightened scrutiny theory: papers published in high visibility journals get more scrutiny and thus flaws/fraud are more likely to be detected, but fraud/errors happen roughly equally everywhere. An associated theory is the high-stakes fraud theory: if you’re going to commit fraud, you need to make the payoff worth the risk, so you’re going to submit to Nature and not BBA.

Anthony Bretscher, in an MBoC commentary on editors, proposes a new theory, which, based on personal experience, I believe accounts for most of the correlation between retraction rate and high impact factor journals:

Continue reading “Retraction rate increases with impact factor – is this because of professional editors?”