On Saturday, my former Center for Genome Sciences colleague Sean Eddy brought up the idea of a Random Genome Project: let’s create a random genome to serve as a null model of genome function. With this random genome, we can determine how much supposedly functional biochemical activity do we expect to see just by chance, and, among other things, we might use a random genome to explore how new functions evolve by “repurposing” (Eddy’s great term) non-functional DNA. In the comments to that post, you can read some discussion of how you might go about making a random genome.
An easier task would be to implement the random genome computationally, an idea I’ve been exploring recently, using a genome-wide binding model along the lines of the one by Wasson and Hartemink.
Why do this? Because we could explore two kinds of null models – the random genome described by Sean Eddy, and the naked genome. Continue reading “Random Genome, Naked Genome”
From “Selfish genes, the phenotype paradigm, and genome evolution,” W. Ford Doolittle & Carmen Sapiena, Nature 284:601-3 (1980), here is one of the original definitions of selfish DNA:
What we propose here is that there are classes of DNA for which a ‘different kind of explanation’ may well be required. Natural selection does not operate on DNA only through organismal phenotype. Cells themselves are environments in which DNA sequences can replicate, mutate, and so evolve. Although DNA sequences which contribute to organismal phenotypic fitness or evolutionary adaptability indirectly increase their own chances of preservation, and may be maintained by classical phenotypic selection, the only selection pressure which DNAs experience directly is the pressure to survive within cells. If there are ways in which mutation can increase the probability of survival within these cells without effect on the organismal phenotype, then sequences whose only ‘function’ is self-preservation will inevitably arise and be maintained by what we call ‘non-phenotypic selection’. Furthermore, if it can be shown that a given gene (region of DNA) or class of genes (regions) has evolved a strategy which increases its probability of survival within cells, then no additional (phenotypic) explanation for its origin or continued existence is required.
… at least from my perspective. I’ll now stop ranting about the hype and media coverage of ENOCDE, and extend my compliments to the consortium for an amazingly well-coordinated effort to achieve an impressive level of consistency and quality for such a large consortium. Whatever else you might want to say about the idea of ENCODE, you cannot say that ENCODE was poorly executed.
It’s time to get into the interesting stuff – what’s actually in the papers. Among the results I’ve been most eagerly awaiting to see in print are the DNase hypersensitivity results now published in Thurman et al. (Nature 489, 75–82 (06 September 2012) doi:10.1038/nature11232)
Why is this interesting? Because it raises provocative and possibly disturbing questions regarding how transcription factors navigate and read out information from the genome. Continue reading “The truly provocative and disturbing stuff in ENCODE”
The latest round of ENCODE papers are out, accessible via a handy ENCODE explorer gateway at Nature. I know what I’ll be doing for the next week. Stay tuned for more Finch & Pea coverage of what all this means, but I can’t resist a few brief comments about function.
First, you can immediately dismiss the NY Times’s misleading headline that suggests much, much more of the genome is functional than we previously thought. Being an intron counts as ‘function’ here, which is a pretty low bar to meet. The ENCODE results indicate that much of the genome is represented within introns, which I find fascinating, but that’s not something that forces us to dramatically revise our ideas about function in the genome.
Second, I’m going to claim (without any proof whatsoever) the title of the world’s record holder for “the largest number of randomly generated DNA sequences tested for function in an enhancer assay.” Hopefully in the not too distant future you can read in print about the 1000+ random sequences (plus several thousand genomic sequences) we tested in our new, smokin’ hot, high-throughput enhancer assay, but here’s the punch line: it’s not that difficult to randomly generate a DNA sequence that will drive substantial tissue-specific transcription.
In other words, whether it’s been selected for function or not, DNA is generally not biochemically inert.
P.S. This seems to be consistent with Ewan Birney’s comment, “It’s clear that 80% of the genome has a specific biochemical activity – whatever that might be.”
P.P.S. Brief methods: We took sequences under ChIP-seq peaks, thoroughly scrambled them while preserving the original di-nucleotide frequencies, and dropped them upstream of a basal promoter to test for enhancer activity.
After five grueling but interesting days, the Cold Spring Harbor Labs Biology of Genomes has wrapped up. So where is genomics heading?
A few lessons: Continue reading “CSHL Biology of Genomes Wrap-Up”