Finding function in the genome with a null hypothesis

Last September, there was a wee bit of a media frenzy over the Phase 2 ENCODE publications. The big story was supposed to be that ‘junk DNA is debunked’ – ENCODE had allegedly shown that instead of being filled with genetic garbage, our genomes are stuffed to the rafters with functional DNA. In the backlash against this storyline, many of us pointed out that the problem with this claim is that it conflates biochemical and organismal definitions of function: ENCODE measured biochemical activities across the human genome, but those biochemical activities are not by themselves strong proof that any particular piece of DNA actually does something useful for us.

The claim that ENCODE results disprove junk DNA is wrong because, as I argued back in the fall, something crucial is missing: a null hypothesis. Without a null hypothesis, how do you know whether to be surprised that ENCODE found biochemical activities over most of the genome? What do you really expect non-functional DNA to look like?

In our paper in this week’s PNAS, we take a stab at answering this question with one of the largest sets of randomly generated DNA sequences ever included in an experimental test of function. Continue reading “Finding function in the genome with a null hypothesis”

ENCODE, Astronomy, & the Future of Genomics

The ENCODE media fail was epic enough that it totally dominated the discussion when the results were released to the public. Now our collective fury has abated1, I’d like to talk about, not what ENCODE did, but what it might mean for how we conduct genomic research in the future.

ENCODE produced an unprecedented amount of data with unprecedented levels of reproducibility between labs. This data will be useful to researchers around the world for year to come. To do so, however, it commanded tremendous resources and marginalized the concerns of independent researchers. Can we harness the data collection power of these collective projects without destroying the creativity and risk-taking of individual scientists in the crucible of collaborative compromise? Continue reading “ENCODE, Astronomy, & the Future of Genomics”

Has ENCODE redefined the meaning of ‘gene’?

While I’ve been criticizing how ENCODE has been hyped and spun, it’s useful to take a look at the situation from the perspective of someone within the consortium. Why are the ENCODE findings supposed to be so revolutionary?

John Stamatoyannopoulos, who has made some of what I see as the most unjustified statements to the press on the topic of ENCODE, lays out his views on the significance of ENCODE in this piece. (Genome Res. 2012. 22: 1602-1611)

He argues that the view of the genome emerging from ENCODE (and, I must emphasize, from the work of other scientists who have used and developed similar technologies, but are not part of the consortium), thanks to its unprecedented detail and global perspective, has radically changed our understanding of just what a gene is. (But have we ever settled on what is a gene?) Continue reading “Has ENCODE redefined the meaning of ‘gene’?”

The non-functional concept of genome function

This month there has been a bit of a dust-up over the question of how much of our genome is functional. ENCODE results say 80% – or do they? Is it 20%? Or more like 8%?

Did ENCODE scientists play fast and loose with the definition of function, or is genome function legitimately defined as those activities the consortium measured? Is functional DNA something that has an effect on phenotype? (Does that include damaging gain-of-function mutations?) Is functional DNA only that DNA present in your genome because of natural selection? (Then what about hitchhiker alleles?) Is a novel mutation existing in only a single individual functional if that mutation is ultimately destined to become fixed in the population by natural selection?

We have to face the fact that, like much else in biology, boundaries between categories are fluid. It makes no sense to try to cleanly divide the genome into functional and non-functional elements. Even what seems like an obvious boundary line, the boundary between protein-coding and non-coding DNA is blurry: many coding regions have cis-regulatory sites with a non-coding, functional role. To divide the genome into categories of coding- and non-coding function, or function and non-function, may satisfy our insatiable desire to classify for our own cognitive comfort, but from the perspective of the cell there is no such distinction. Continue reading “The non-functional concept of genome function”

Your genome is an ecosystem

I’m not sure how many of the people writing Science news features, press releases for ENCODE*, or completely uninformed and baseless rants on the idea of junk DNA are familiar with the work discussed in this review, none of which is refuted by the ENCODE results:

“The ecology of the genome — mobile DNA elements and their hosts”, John F. Y. Brookfield, Nature Reviews Genetics 6, 128-136 (February 2005):

One activity of evolutionary biologists involves looking at features of organisms and seeking to explain them in adaptive ways — demonstrating that the feature to be explained will confer on its bearer a higher inclusive FITNESS than an alternative would. However, as applied to phenotypic features, this approach is not always intellectually rigorous — only knowledge of the ways in which genes influence the phenotype can allow the identification of realistic alternatives to observed traits. This approach is more valid when applied to genomic components — an explanation of the presence of a DNA sequence consists of demonstrating that an organism with that sequence is fitter than one that lacks it or one in which the sequence is mutated. The methodology is straightforward — we make mutations and observe the reduction in fitness that is created. All parts of the genome could therefore potentially be seen in this same light — every sequence present is there because its removal or replacement would cause a reduction in the organism’s fitness. In discussing microorganisms, such a view might be tenable. However, the genomes of multicellular eukaryotes possess sequences, which could perhaps form the majority, that are not there for reasons related to their present use.

Why does a simplistic view of an entirely functional genome fail? In essence, it does so because some genomic components, notably interspersed repetitive DNA sequences, are indistinguishable from parasites…

This paper develops the ecosystem analogy of the genome. Later this week, I’ll develop the analogy of your genome as a post-apocalytpic wasteland.

*Sadly, a significant number of ENCODE scientists seem completely unaware of this literature as well.

%d bloggers like this: