The non-functional concept of genome function

This month there has been a bit of a dust-up over the question of how much of our genome is functional. ENCODE results say 80% – or do they? Is it 20%? Or more like 8%?

Did ENCODE scientists play fast and loose with the definition of function, or is genome function legitimately defined as those activities the consortium measured? Is functional DNA something that has an effect on phenotype? (Does that include damaging gain-of-function mutations?) Is functional DNA only that DNA present in your genome because of natural selection? (Then what about hitchhiker alleles?) Is a novel mutation existing in only a single individual functional if that mutation is ultimately destined to become fixed in the population by natural selection?

We have to face the fact that, like much else in biology, boundaries between categories are fluid. It makes no sense to try to cleanly divide the genome into functional and non-functional elements. Even what seems like an obvious boundary line, the boundary between protein-coding and non-coding DNA is blurry: many coding regions have cis-regulatory sites with a non-coding, functional role. To divide the genome into categories of coding- and non-coding function, or function and non-function, may satisfy our insatiable desire to classify for our own cognitive comfort, but from the perspective of the cell there is no such distinction. Continue reading “The non-functional concept of genome function”

Your genome is an ecosystem

I’m not sure how many of the people writing Science news features, press releases for ENCODE*, or completely uninformed and baseless rants on the idea of junk DNA are familiar with the work discussed in this review, none of which is refuted by the ENCODE results:

“The ecology of the genome — mobile DNA elements and their hosts”, John F. Y. Brookfield, Nature Reviews Genetics 6, 128-136 (February 2005):

One activity of evolutionary biologists involves looking at features of organisms and seeking to explain them in adaptive ways — demonstrating that the feature to be explained will confer on its bearer a higher inclusive FITNESS than an alternative would. However, as applied to phenotypic features, this approach is not always intellectually rigorous — only knowledge of the ways in which genes influence the phenotype can allow the identification of realistic alternatives to observed traits. This approach is more valid when applied to genomic components — an explanation of the presence of a DNA sequence consists of demonstrating that an organism with that sequence is fitter than one that lacks it or one in which the sequence is mutated. The methodology is straightforward — we make mutations and observe the reduction in fitness that is created. All parts of the genome could therefore potentially be seen in this same light — every sequence present is there because its removal or replacement would cause a reduction in the organism’s fitness. In discussing microorganisms, such a view might be tenable. However, the genomes of multicellular eukaryotes possess sequences, which could perhaps form the majority, that are not there for reasons related to their present use.

Why does a simplistic view of an entirely functional genome fail? In essence, it does so because some genomic components, notably interspersed repetitive DNA sequences, are indistinguishable from parasites…

This paper develops the ecosystem analogy of the genome. Later this week, I’ll develop the analogy of your genome as a post-apocalytpic wasteland.

*Sadly, a significant number of ENCODE scientists seem completely unaware of this literature as well.

My last thoughts on the media coverage of ENCODE

I’m interested in moving on to the science of ENCODE, and to put the media coverage behind us. My final thoughts on the subject are up at the Huffington Post: “A Genome-sized Media Failure:”

This was a fantastic opportunity for scientists and science journalists to explain to the public some of the exciting and important research findings in genome biology that are changing how we think about health, disease, and our evolutionary past. But we blew it, in a big way…

[The media] stories failed us all in three major ways: they distorted the science done before ENCODE, they obscured the real significance of the ENCODE project, and most crucially, they mislead the public on how science really works.

A few supplemental points:

1) You’ve got to read John Timmer’s excellent discussion of the media coverage, filled with more details.

2) The ENCODE consortium was well-run, produced high-quality data, and measured the right biochemical activities; and I’m very interested in seeing the results.

3) However, I’m not convinced that big science was the way to go here, nor am I convinced that this will become the one dataset to rule them all as the technology rapidly changes… which means you can justify an open-ended project that has no concrete end point.

4) My opinion in point #3 could of course be wrong, but it will take time to for that to become clear.

Keeping genomes small

We read this paper in my Eukaryotic Genomes class (more than 10 years ago…sigh). The paper suggests that you need to be proactive about getting rid of pseudogenes and transposable elements if you want to keep your genome small:

High intrinsic rate of DNA loss in Drosophila

DMITRI A. PETROV, ELENA R. LOZOVSKAYA & DANIEL L. HARTL

Nature 384, 346 – 349 (28 November 1996)

Differences in deletion rate may also contribute to the divergence in genome size among taxa, the so-called ‘C-value paradox’. Two reports find a positive correlation between genome size and intron size in a variety of taxa. In addition, the reduction in the intron size in birds, whose genome size is smaller than that of other tetrapods, has been inferred to be due to multiple separate deletions scattered along the introns. It is noteworthy that pseudogenes are much rarer in birds than in mammals. These results argue that differences in genome size among related organisms may be determined primarily by the variation in the genome-wide deletion rate, and not, for instance, by different rates of insertion of transposable elements.

Genome PR is OK

There was some criticism of this video out there, but I liked it. Given how little attention the average news reader/online browser is going to devote to genomics, I think this kind of thing is just right (except for the misleading throwaway line about junk DNA).

Sure, the video hypes ENCODE as biology’s latest, greatest, development, but nobody outside the scientific community is going to know the difference between ENCODE and all of the rest of us genome biologists anyway. So basically, the video us hyping all of us.