This month there has been a bit of a dust-up over the question of how much of our genome is functional. ENCODE results say 80% – or do they? Is it 20%? Or more like 8%?
Did ENCODE scientists play fast and loose with the definition of function, or is genome function legitimately defined as those activities the consortium measured? Is functional DNA something that has an effect on phenotype? (Does that include damaging gain-of-function mutations?) Is functional DNA only that DNA present in your genome because of natural selection? (Then what about hitchhiker alleles?) Is a novel mutation existing in only a single individual functional if that mutation is ultimately destined to become fixed in the population by natural selection?
We have to face the fact that, like much else in biology, boundaries between categories are fluid. It makes no sense to try to cleanly divide the genome into functional and non-functional elements. Even what seems like an obvious boundary line, the boundary between protein-coding and non-coding DNA is blurry: many coding regions have cis-regulatory sites with a non-coding, functional role. To divide the genome into categories of coding- and non-coding function, or function and non-function, may satisfy our insatiable desire to classify for our own cognitive comfort, but from the perspective of the cell there is no such distinction. Continue reading “The non-functional concept of genome function”
I’m not sure how many of the people writing Science news features, press releases for ENCODE*, or completely uninformed and baseless rants on the idea of junk DNA are familiar with the work discussed in this review, none of which is refuted by the ENCODE results:
“The ecology of the genome — mobile DNA elements and their hosts”, John F. Y. Brookfield, Nature Reviews Genetics 6, 128-136 (February 2005):
One activity of evolutionary biologists involves looking at features of organisms and seeking to explain them in adaptive ways — demonstrating that the feature to be explained will confer on its bearer a higher inclusive FITNESS than an alternative would. However, as applied to phenotypic features, this approach is not always intellectually rigorous — only knowledge of the ways in which genes influence the phenotype can allow the identification of realistic alternatives to observed traits. This approach is more valid when applied to genomic components — an explanation of the presence of a DNA sequence consists of demonstrating that an organism with that sequence is fitter than one that lacks it or one in which the sequence is mutated. The methodology is straightforward — we make mutations and observe the reduction in fitness that is created. All parts of the genome could therefore potentially be seen in this same light — every sequence present is there because its removal or replacement would cause a reduction in the organism’s fitness. In discussing microorganisms, such a view might be tenable. However, the genomes of multicellular eukaryotes possess sequences, which could perhaps form the majority, that are not there for reasons related to their present use.
Why does a simplistic view of an entirely functional genome fail? In essence, it does so because some genomic components, notably interspersed repetitive DNA sequences, are indistinguishable from parasites…
This paper develops the ecosystem analogy of the genome. Later this week, I’ll develop the analogy of your genome as a post-apocalytpic wasteland.
*Sadly, a significant number of ENCODE scientists seem completely unaware of this literature as well.
We read this paper in my Eukaryotic Genomes class (more than 10 years ago…sigh). The paper suggests that you need to be proactive about getting rid of pseudogenes and transposable elements if you want to keep your genome small:
High intrinsic rate of DNA loss in Drosophila
DMITRI A. PETROV, ELENA R. LOZOVSKAYA & DANIEL L. HARTL
Nature 384, 346 – 349 (28 November 1996)
Differences in deletion rate may also contribute to the divergence in genome size among taxa, the so-called ‘C-value paradox’. Two reports find a positive correlation between genome size and intron size in a variety of taxa. In addition, the reduction in the intron size in birds, whose genome size is smaller than that of other tetrapods, has been inferred to be due to multiple separate deletions scattered along the introns. It is noteworthy that pseudogenes are much rarer in birds than in mammals. These results argue that differences in genome size among related organisms may be determined primarily by the variation in the genome-wide deletion rate, and not, for instance, by different rates of insertion of transposable elements.
On Saturday, my former Center for Genome Sciences colleague Sean Eddy brought up the idea of a Random Genome Project: let’s create a random genome to serve as a null model of genome function. With this random genome, we can determine how much supposedly functional biochemical activity do we expect to see just by chance, and, among other things, we might use a random genome to explore how new functions evolve by “repurposing” (Eddy’s great term) non-functional DNA. In the comments to that post, you can read some discussion of how you might go about making a random genome.
An easier task would be to implement the random genome computationally, an idea I’ve been exploring recently, using a genome-wide binding model along the lines of the one by Wasson and Hartemink.
Why do this? Because we could explore two kinds of null models – the random genome described by Sean Eddy, and the naked genome. Continue reading “Random Genome, Naked Genome”
From “Selfish genes, the phenotype paradigm, and genome evolution,” W. Ford Doolittle & Carmen Sapiena, Nature 284:601-3 (1980), here is one of the original definitions of selfish DNA:
What we propose here is that there are classes of DNA for which a ‘different kind of explanation’ may well be required. Natural selection does not operate on DNA only through organismal phenotype. Cells themselves are environments in which DNA sequences can replicate, mutate, and so evolve. Although DNA sequences which contribute to organismal phenotypic fitness or evolutionary adaptability indirectly increase their own chances of preservation, and may be maintained by classical phenotypic selection, the only selection pressure which DNAs experience directly is the pressure to survive within cells. If there are ways in which mutation can increase the probability of survival within these cells without effect on the organismal phenotype, then sequences whose only ‘function’ is self-preservation will inevitably arise and be maintained by what we call ‘non-phenotypic selection’. Furthermore, if it can be shown that a given gene (region of DNA) or class of genes (regions) has evolved a strategy which increases its probability of survival within cells, then no additional (phenotypic) explanation for its origin or continued existence is required.