While I’ve been criticizing how ENCODE has been hyped and spun, it’s useful to take a look at the situation from the perspective of someone within the consortium. Why are the ENCODE findings supposed to be so revolutionary?
John Stamatoyannopoulos, who has made some of what I see as the most unjustified statements to the press on the topic of ENCODE, lays out his views on the significance of ENCODE in this piece. (Genome Res. 2012. 22: 1602-1611)
He argues that the view of the genome emerging from ENCODE (and, I must emphasize, from the work of other scientists who have used and developed similar technologies, but are not part of the consortium), thanks to its unprecedented detail and global perspective, has radically changed our understanding of just what a gene is. (But have we ever settled on what is a gene?)
I think Stamatoyannopoulos makes a very good point in the following passage. It’s a bit DNA-centric, most likely overestimating the influence of all this transcriptional variation on the structures and quantities of the proteins that carry out most of the cellular jobs. But this idea of a gene as a polyfunctional coalescence point for various types of transcripts is an important emerging idea, despite the fact that essentially all of the detailed biochemical mechanisms involved have been understood, at least in outline, for a long time.
Although the gene has conventionally been viewed as the fundamental unit of genomic organization, on the basis of ENCODE data it is now compellingly argued that this unit is not the gene but rather the transcript (Washietl et al. 2007; Djebali et al. 2012a). On this view, genes represent a higher-order framework around which individual transcripts coalesce, creating a polyfunctional entity that assumes different forms under different cellular states, guided by differential utilization of regulatory DNA.
In a sense, we’ve taken our cartoon models too seriously in the past – they are biochemically unrealistic. As I’ve noted before, biological categories are fluid. The machinery of transcription does not follow the highly-orchestrated, deterministic motions of an assembly line, and in fact the term ‘machinery’ is misleading. This is extremely complex chemistry, and surprisingly precise and reproducible function emerges from the jiggling, random walks of the players. And evolution works with this chemistry, not with the tools of a machine shop.
In our day to day lives, we’re very familiar with how machines behave, but not very familiar with the behavior of chemical systems. And so, when it comes to the complex features of biology, we cling to mechanical analogies and not chemical ones, but this is a mistake.
Given how biochemically active the genome can be, it’s crucial that we hold in mind a null model of how we expect non-functional DNA to behave. For example, do we now have to consider functional any transposable elements that show correlated, cell-type specific chromatin marks or transcription? Should we really believe that most cell-type specific DNAseI hypersensitive sites indicate genuine regulatory function, or should our default hypothesis be that, given how dramatically the transcriptional regulatory machinery varies among cell types, much non-functional DNA will show cell-type specific behavior?
Here is where I part ways from Stamatoyannopoulos. Given what we know about variation in genome size among closely related species, and the likely role of genetic drift in establishing genome size, and the evolutionary history of transposable element expansion, I think this claim is completely unjustified – even absurd in its implicit human exceptionalism:
It is still widely believed that functional elements, from exons to regulatory DNA, are relatively rare features of the genomic landscape. In the case of regulatory DNA, this is certainly true within the context of an individual cell type, where DNase I hypersensitive sites and associated transcription factor occupancy sites mapped by ChIP-seq encompass on the order of 1%–2% of the genome—a compartment roughly the size of the exome. However, because the majority of regulatory DNA regions are highly cell type-selective (The ENCODE Project Consortium 2012; Thurman et al. 2012), the genomic landscape rapidly becomes crowded with regulatory DNA as the number of cell types and states assayed increases. Even after assaying more than 120 distinct cell types, this trend shows little evidence of saturation (The ENCODE Project Consortium 2012). It is thus not unreasonable to expect that 40% and perhaps more of the genome sequence encodes regulatory information—a number that would have been considered heretical at the outset of the ENCODE project.
And in fact, this statement is in conflict with an argument made later in this piece:
However, it has also given rise to a broad tendency to think of all elements of a biochemically defined class as having the same functional properties. For example, genomic occupancy by the poly-zinc finger transcriptional regulator CTCF is a prominent feature of experimentally defined enhancer blockers and chromatin boundary elements, as well as bifunctional elements (Gaszner and Felsenfeld 2006). Yet it has now become commonplace to find any CTCF occupancy sites obtained by ChIP-seq referred to as “insulators” without any further specification—and without regard to the well-documented involvement of promoter-bound CTCF in transcriptional control (Klenova et al. 1993). Compounding this complexity, ENCODE has now made available data sets encompassing CTCF occupancy across large numbers of cell types (The ENCODE Project Consortium 2012), revealing substantial diversity in occupancy patterns that reflect important differences in regulation and likely in function (H Wang et al. 2012). Both the sheer number and diversity of these elements argue strongly against ascribing a monolithic functional activity.
You could replace the term CTCF with DNase Hypersensitive site, and make the same argument.
POSTSCRIPT: I can’t resist including one last quote that illustrates the difficulties we’ve had talking about function:
ENCODE is thus in a unique position to promote clearer terminology that separates the identification of functional elements per se from the ascription of specific functional activities using historical experimentally defined categories, and also to dissuade the ascription of very specific functions based on a biochemical signature in place of a deeper mechanistic understanding.
In terms of media presentation, that clarification clearly did not happen.