“What is your quest?”

That thing where you indavertantly facilitate a polite-ish discussion between Michael Eisen and Ewan Birney about ENCODE’s claims regarding “biochemical function” in the genome using modified Monty Python and the Holy Grail* quotes:

Screenshot 2014-12-05 12.23.26

Screenshot 2014-12-05 12.23.23

Screenshot 2014-12-05 12.23.36

With a “wafer thin” side of Open Access:

Screenshot 2014-12-05 12.23.39

You can check out the Storify of the #MontyPythonidae edition of #SCInema here.

*An apropos and overused scientific metaphor itself.

How many genes were we supposed to have?

In my Pacific Standard column this week, I note that over the course of the 20th century our concept of the gene went from being an abstract unit of heredity to an increasingly restrictive molecular definition. The advantage of this molecular definition is that it made genes countable; the drawback is that it is ill-suited to describe the heterogeneous collection of DNA elements that make up our genome. We’re now in the somewhat ironic situation where more function in our genome falls outside of these conventional genes. As other have noted before, the physical and ‘genetic’ definition of a gene are in tension.

Before we sequenced the human reference genome, how many genes did people expect us to have? Most estimates made in the 1990’s put the number between 60,000 and 100,000. One group in 1994 reviewed estimates out there in the literature, which ranged form 20,000 to 100,000, and ultimately favored a prediction in the 60,000-70,000 range. In 1998, Deloukas, et al. published a physical map of 30,000 human genes (PDF) and figured that they had captured nearly half of the complement of human genes. In 1999, Francis Collins was using a number of “80,000 or so.” My molecular cell biology textbook, the third edition of Lodish, et al. (2000) stated that our genomes were expected to contain 60,000-100,000 genes. One estimate, made less than a year before the draft genome sequence was published, noted that “Early estimates suggested that there might be 60,000−100,000 (ref. 1) human genes, but recent analyses of the available data from EST sequencing projects have estimated as few as 45,000 (ref. 2) or as many as 140,000 (ref. 3) distinct genes.” They worked out their own estimate of the total genes in the genome: “Using highly refined and tested algorithms for EST analysis, we have arrived at two independent estimates indicating the human genome contains approximately 120,000 genes.” Continue reading “How many genes were we supposed to have?”

Having your cake and eating it: more arguments over human genome function

My fellow F&P publican Josh Witten has drawn my attention to a rebuttal (PDF) of Graur et al’s rebuttal of claims made by ENCODE.

The authors, John Mattick and Marcel Dinger of the University of New South Wales, advance various claims to dispute the idea that most of the genome is non-functional, but here I’ll just focus on one:

We also show that polyploidy accounts for the higher than expected genome sizes in some eukaryotes, compounded by variable levels of repetitive sequences of unknown significance.

Uh, yeah. That’s the resolution to the C-value paradox, and it’s one reason why people argue that repetitive sequences, i.e. transposable elements, are, contra claims about ENCODE data, largely non-functional – because their numbers vary greatly between species with a similar biology. As Doolittle writes:

A balance between organism-level selection on nuclear structure and cell size, cell division times and developmental rate, selfish genome-level selection favoring replicative expansion, and (as discussed below) supraorganismal (clade-level) selective processes—as well as drift— must all be taken into account.

Reading into the paper, how is it possible that the following claims by Mattick and Dinger don’t contradict each other? Continue reading “Having your cake and eating it: more arguments over human genome function”

Function and another failure to consider the null hypothesis

Somehow, the following kind of illogic creeps into so many discussion of genomic function:

In terms of pathological functions, somatic mosaicism of terminally differentiated cells has long been known to cause cancer. Recent work shows that somatic mosaicism of nervous system tissues underlies a host of neurodevelopmental and perhaps neuropsychiatric diseases (17). However, the extent of somatic mosaicism that is now being reported in a variety of healthy tissues and cell types suggests that it also has physiological functions.

– James R Lupski, “Genome Mosaicism—One Human, Multiple Genomes” Science 26 July 2013: Vol. 341 no. 6144 pp. 358-359

This paragraph comes after the author carefully describes why extensive mosaicism is unavoidable, given the number of cell divisions we undergo during development from a zygote into a fully adult human.

So explain to me why extensive mosaicism “suggests that it also has physiological functions”? Why should we think that most of the mosaicism being observed is anything like the deliberate hypermutation that happens in the immune system? Isn’t the default hypothesis that mosaicism is the expected, non-functional by-product of trillions of cell divisions?

Finding function in the genome part 2: All function is local (almost)

Yesterday I wrote about why negative controls are important in a genome-scale search for functional DNA. Today, I’ll discuss the main focus of our recent work: understanding what makes a piece of DNA functional.

The particular DNA I’m interested in is known by not very functional term ‘cis-regulatory’ DNA – a term that requires six syllables, an italicized Latin prefix, and a hyphen. This is DNA that is crucial in gene decisions: cis-regulatory DNA helps to control when, where, and how much genes are expressed. This happens because cis-regulatory DNA serves as a landing pad for ‘transcription factors’, proteins that land on cis-regulatory DNA and control the expression of nearby (or sometimes not so nearby) genes.

The question that haunts me is this: why don’t transcription factors get lost? My worry follows from these three observations: Continue reading “Finding function in the genome part 2: All function is local (almost)”