Sometimes you wanna go…

…where everybody knows their genomics. Bum bum bum.

Which is as far as I’m taking that, because I have the bad feeling that y’all would suggest that I’m the Cliff Clavin around here (I’m so the Carla).

Technology willing (let’s all take a long, suggestive look at Mike for a moment), we will be doing a live Google Hangout to talk about the ENCODE project tonight (Tuesday, 11 September) at 9PM Eastern. We’ll chat about what it means for science, “junk DNA”, and who (if anyone) actually knows what they are talking about.

Oh yeah, it is BYOB until we get that whole virtual liquor license thing sorted out.

*Leave a comment here or tweet @joshwitten or @finchandpea if you are interested and need a hangout invite.

Random Genome, Naked Genome

On Saturday, my former Center for Genome Sciences colleague Sean Eddy brought up the idea of a Random Genome Project: let’s create a random genome to serve as a null model of genome function. With this random genome, we can determine how much supposedly functional biochemical activity do we expect to see just by chance, and, among other things, we might use a random genome to explore how new functions evolve by “repurposing” (Eddy’s great term) non-functional DNA. In the comments to that post, you can read some discussion of how you might go about making a random genome.

An easier task would be to implement the random genome computationally, an idea I’ve been exploring recently, using a genome-wide binding model along the lines of the one by Wasson and Hartemink.

Why do this? Because we could explore two kinds of null models – the random genome described by Sean Eddy, and the naked genome. Continue reading “Random Genome, Naked Genome”

The truly provocative and disturbing stuff in ENCODE

… at least from my perspective. I’ll now stop ranting about the hype and media coverage of ENOCDE, and extend my compliments to the consortium for an amazingly well-coordinated effort to achieve an impressive level of consistency and quality for such a large consortium. Whatever else you might want to say about the idea of ENCODE, you cannot say that ENCODE was poorly executed.

It’s time to get into the interesting stuff – what’s actually in the papers. Among the results I’ve been most eagerly awaiting to see in print are the DNase hypersensitivity results now published in Thurman et al. (Nature 489, 75–82 (06 September 2012) doi:10.1038/nature11232)

Why is this interesting? Because it raises provocative and possibly disturbing questions regarding how transcription factors navigate and read out information from the genome. Continue reading “The truly provocative and disturbing stuff in ENCODE”

Polling junk DNA

I missed this poll by Chris Gunter yesterday, asking “If you are a non-genomicist, can you tell us if you thought/were taught much of the genome was “junk”?

Well, I’m 1) a day late and 2) not a non-genomicist, but I’ll reply anyway, because we need a little history review.

In my Eukaryotic Genomes course in grad school (in the year the draft Human Genome sequence came out), I was taught by Tom Eickbush, not so much about ‘junk DNA’, but about ‘selfish DNA’. The point is largely the same regardless of what we call it. Among the first papers we read in Eickbush’s class were the classic Doolittle and Sapienza and Orgel and Crick papers on selfish DNA.

The key argument of these papers was this: parasitic DNA that can replicate itself within the genome requires no other explanation for its existence other than is ability to replicate, period. It does not need to be functional, from the perspective of the organism. It may acquire a useful function. But in general, absent evidence of such a useful function, we don’t need to ask the question, ‘what is the function of this DNA?’ There’s no mystery why it’s there – because it can replicate. Continue reading “Polling junk DNA”

ENCODE Media FAIL (or, Where’s the Null Hypothesis?)

I’m ready to drink myself into a stupor, and not because it’s my birthday. This week we’re seeing a massive science reporting fail on a large scale. And just to be clear, I’m not only (or even mostly) blaming reporters.

We’ve known for a long time that protein-coding genes are regulated by non-coding DNA sequences, ‘gene switches’, if you will. We’ve known for decades that the genome contains many ‘gene switches’. (See the references in this review.) That’s uncontested.

ENCODE is significant because they’ve provided a very useful data set, and not because they’ve a) shown that non-coding DNA is important (we knew that), or b) most of the genome has phenotypically important regulatory function (it does not), or c) that most of the genome is evolutionarily conserved (not true either). What they have shown is that much of the genome is covered by introns, and it is hard to find biochemically inert DNA, which those of us who’ve tried to generate random, ‘neutral’ DNA sequences (for say, spacers in synthetic promoter experiments) will agree with.

Now, let’s see how major media stories are handling the significance of ENCODE (h/t to Ryan Gregory for compiling the list of stories): Continue reading “ENCODE Media FAIL (or, Where’s the Null Hypothesis?)”