My fellow F&P publican Josh Witten has drawn my attention to a rebuttal (PDF) of Graur et al’s rebuttal of claims made by ENCODE.
The authors, John Mattick and Marcel Dinger of the University of New South Wales, advance various claims to dispute the idea that most of the genome is non-functional, but here I’ll just focus on one:
We also show that polyploidy accounts for the higher than expected genome sizes in some eukaryotes, compounded by variable levels of repetitive sequences of unknown significance.
Uh, yeah. That’s the resolution to the C-value paradox, and it’s one reason why people argue that repetitive sequences, i.e. transposable elements, are, contra claims about ENCODE data, largely non-functional – because their numbers vary greatly between species with a similar biology. As Doolittle writes:
A balance between organism-level selection on nuclear structure and cell size, cell division times and developmental rate, selfish genome-level selection favoring replicative expansion, and (as discussed below) supraorganismal (clade-level) selective processes—as well as drift— must all be taken into account.
Reading into the paper, how is it possible that the following claims by Mattick and Dinger don’t contradict each other?
1. Claims about the non-functionality of the genome are based on a “questionable assumption” of transposable element non-functionality:
…the substantive scientific argument of Graur et al. is based primarily on the apparent lack of sequence conservation of the vast majority (~90%) of the human genome, suggesting that this indicates lack of selective constraint (and therefore function). The fundamental flaw, however, in this argument is that conservation is relative, and its estimation in the human genome is largely based on the questionable proposition that transposable elements, which provide the major source of evolutionary plasticity and novelty (Brosius 1999), are largely non-functional.
2. The C-value paradox (or the Onion Test) is not an argument against function in most of the human genome because transposable elements (i.e. repetitive sequences) don’t add genetic complexity:
…more explicitly discussed by Doolittle (Doolittle 2013), is the so-called ‘C-value enigma’, which refers to the fact that some organisms (like some amoebae, onions, some arthropods, and amphibians) have much more DNA per cell than humans, but cannot possibly be more developmentally or cognitively complex, implying that eukaryotic genomes can and do carry varying amounts of unnecessary baggage. That may be so, but the extent of such baggage in humans is unknown. However, where data is available, these upward exceptions appear to be due to polyploidy and/or varying transposon loads (of uncertain biological relevance), rather than an absolute increase in genetic complexity (Taft et al. 2007).
Finally, whenever you read that developmental complexity correlates with genome size, run for the hills:
Moreover, there is a broadly consistent rise in the amount of non-protein-coding intergenic and intronic DNA with developmental complexity, a relationship that proves nothing but which suggests an association that can only be falsified by downward exceptions, of which there are none known…
Definitions of upward and downward exceptions seem to be a bit circular in this piece, and anyway, Ford Doolittle provides an example of a downward exception (puffer fish with its 400 mb genome). On the more general point of the relationship of developmental complexity and genome size, I’ll refer you to Ryan Gregory’s discussion of the issue (be sure to follow the links therein). Finally, where in this paper is any discussion of the appropriate null hypothesis?
UPDATE: For another exceptionally small genome, go read about the carnivorous bladderwort.
UPDATE 2: Larry Moran has a much more detailed dissection of this paper over at Sandwalk – don’t miss it.
26 thoughts on “Having your cake and eating it: more arguments over human genome function”
A major flaw in their reasoning is the assumption that transposable elements are functional. Assumption of function is an assumption that something other than chance is required, which is fundamentally a more complex hypothesis than the assumption of non-function. In the absence of evidence to the contrary, we should favor the hypothesis (non-function) that does not require positing anything other than chance effects.
Notwithstanding their oversimplified claim of “conservation being relative”, we are not entirely without evidence on these matters. In addition to experimental evidence showing that individual elements do not contribute measurably to fitness in non-human animals, the time tested principles of quantitative genetics and evolutionary theory give us no reason to believe that the presence of massive quantities of transposable element remnants in the human genome is based on any “function”.
I would also add that, given what we know about TE expansion mechanisms, etc. that the default hypothesis should be they’re non-functional, as per Orgel & Crick and Doolittle et al’s selfish gene ideas.
Also true. But even if the details were different, in terms of the types of TEs or their mechanisms, we would expect to see such an expansion of non-functional sequence in species with population & life history characteristics like those of humans.
Contra Mattick, there are numerous examples of “downward exceptions”, besides the much-discussed pufferfish, and good cases where wide variation exists within close relatives, which offer the best prospects for experimental investigation.
Drosophila genome is very reduced in comparison to other flies. Its genome can be directly compared to other flies with, presumably, more junk.
In bladderworts [Genlisea-Utricularia] there is wide variation. Utricularia gibba is now famous for its fully sequenced, tiny genome, but the beauty part is, there are other bladderworts that are closely related but with far larger or even smaller genomes, that just beg to be sequenced. Utricularia prehensilis has 4.56 times as much DNA as U. gibba, while Genlisea hispidula with 1510 Mbp is much bigger still, far far larger than U. gibba. Meanwhile, G. margaretae and G. aurea are even smaller than U. gibba! I have been going about saying someone should write an NIH grant to sequence Utricularia prehensilis or Genlisea hispidula.
The turkey has 1.1 Gbp, about 1/3 x human.
The frog Hyla nana has 1.89 pg C = 55% of human.
The sea urchin has 814 Mbp = 1/4 x Human.
I have read that the cheetah and the hummingbird have small genomes, but have been unable to confirm this.
And on that topic, let’s not forget the 100-fold variation within amphibians, from genomes much smaller (less than 1/3) than human, to Necturus Lewisi with 34 times as much as human.
“An extraordinary range of C values is found in amphibians where the smallest genomes are just below 10^9 bp while the largest are almost 10^11 [100 billion basepairs, compared to 3.2 billion in humans]. It is hard to believe that this could reflect a 100-fold variation in the number of genes needed to specify different amphibians.” [Lewin, Genes II]
Great stuff. I don’t know how anyone who considers themselves a genome biologist can make the “no downward exceptions” claim.
Run for the hills, indeed! When Mattick writes:
“Moreover, there is a broadly consistent rise in the amount of non-protein-coding intergenic and intronic DNA with developmental complexity, a relationship that proves nothing but which suggests an association that can only be falsified by downward exceptions, of which there are none known…”
Here Mattick is referring to his infamous “Dog’s Ass Plot” that was so effectively skewered by T. Ryan Gregory. That plot of Mattick’s was based on fake data– his diagonal slopes on the bars on the graph? made up numbers– and bordered on scientific fraud. But, having made a fake plot based on fake numbers, Mattick now treats his fake data as real.
His statement that it “can only be falsified by downward exceptions” is a non sequitur– why not UPWARD exceptions!? It seems to me UPDWARD exceptions falsify the Dog’s Ass– and his assertion that there are no downward exceptions is false. Why not Hyla nana?
He doesn’t consider upward exceptions a problem because those are just due to TE expansions, which are non-func…. um, rather, not adding genetic complexity.
Yes, that’s the weird and self-contradictory part of his argument that you were correct to focus on. He says that repetitive DNA sequences explain the C value paradox, but, in order to wave away the C value paradox, he also says they DON’T add to the complexity of the genome. On the other hand, he also says they’re functional, all DNA is functional (!) and he explains the supposed superior complexity of the human as compared with, say, the legless salamander Necturus lewisi that has 34.5 times as much DNA as a human (I know, I know, but that’s his logic, not mine.)
I can’t believe he would write a paragraph like this:
Mattick: “That may be so, but the extent of such baggage in humans is unknown. However, where data is available, these upward exceptions appear to be due to polyploidy and/or varying transposon loads (of uncertain biological relevance), rather than an absolute increase in genetic complexity“
So he just admitted that HALF THE HUMAN GENOME, HALF OF IT!! does not increase genetic complexity! Yes, you can say that HALF of the human genome does not add to genetic complexity, but you are not allowed to say it does not add to function!
He does not describe his metric for genetic complexity. If he means Kolmogorov complexity, he’s right– repetitive sequences add very little to Kolmogorov or algorithmic complexity– but while that’s true, it appears to contradict his assertion that the whole genome is functional, and adds to developmental or anatomical complexity.
To wave away the C value paradox, Mattick is saying that HALF the genome, the transposons, don’t add to genetic complexity, but somehow DO add to developmental or anatomical complexity. You are correct to emphasize the self-contradiction of that.
But I want to also add that besides being self-contradictory, it’s gobbledygook in terms of Mattick’s own allegedly relevant measure of complexity, which he calls “developmental complexity” without defining a metric for it! But supposedly we humans are superior to all other organisms by a metric Mattick won’t define! Like here:
Mattick: “Moreover, there is a broadly consistent rise in the amount of non-protein-coding intergenic and intronic DNA with developmental complexity“
Mattick doesn’t define “developmental complexity” and he CANNOT without looking like a fool, and sucking himself into a whirlpool of self-contradiction! Consider the following facts:
All salamanders, newts, axolotls, caecilians and waterdogs have far larger genomes than human beings, SFAIK. That’s the rule, not the exception. Are they more developmentally complex than humans?
Suppose Mattick were to argue that indeed, all salamanders are indeed more developmentally complex than humans. He’d then have a huge problem: axolotols don’t fully develop into the adult form of salamanders– they keep their gills– they are certainly less “developmentally complex” than humans and other salamanders, but the axolotl Ambystoma mexicanum has 13.7 times as much DNA as a human! SFAIK, this is true of all axolotls studied!
Is that not enough of a contradiction for you? OK, consider the caecilians: which are legless like snakes, but are amphibians. They never grow legs, and are certainly less “developmentally complex” than humans, but the legless caecilian Siphonops annulatus has 4 times as much DNA as a human.
Want more of a contradiction? The two-toed amphiuma, Amphiuma means, aka “Conger eel”, like an axolotl, never undergoes full development, has small vestigial legs, no eyelids, and no tongue. But it has 27.4 times as much DNA as a human. According to Mattick’s logic, it is more “developmentally complex” than a human.
Mattick cannot dismiss these facts as “flukes” or exceptions to his imaginary “rule” that he faked in his Dog’s Ass Plot. Again I repeat: All salamanders, newts, axolotls, caecilians and waterdogs have far larger genomes than human beings, SFAIK. In addition, all lungfishes have more DNA than humans.
Those are the rules. Those are not the exceptions.
Here’s another rule: marsupials on AVERAGE have 22% more DNA than the average placental mammal. The AVERAGE for marsupials is 16% higher than human genome size. Bennett’s wallaby, for one, has 60% more DNA than a human. That’s not a fluke, that’s the rule. Are marsupials more “developmentally complex” than humans?
There are also many frogs, many sharks, many crustaceans, some insects, some annelid worms, many flatworms and many plants with more DNA than humans.
According to my recollection of Inside Nature’s Giants, female kangaroos are not only constantly pregnant (they put the embryo in stasis until the previous joey is weaned), but they also have three vaginas. Marsupials are inarguably more developmentally complex than humans. 🙂
The talk I saw Mattick give made the basic assumption that the observation of low-level genome-wide transcription was proof that most of the genome was functional. After all, why would our cells transcribe something that wasn’t useful? This is the level of thinking with which we are wrestling.
If this is the case, Mattick should admit that his metric for “superior species” is a count of how many vaginas you have.
If marsupials are more “developmentally complex” than humans, then why doesn’t he put Bennett’s wallaby at the far right in the Dog’s Ass Plot? It’s the summit of creation.
Although many points and arguments made by Mattick and Dinger in their paper “The extent of functionality in the human genome” are rather vague, which makes it difficult to expose their full weaknesses, Mike’s critique is highly relevant, and I hope Mattick and Dinger will respond to it.
In his critique, Mike quotes from Doolittle’s PNAS paper “Is junk DNA bunk? A critique of ENCODE”. It just happen that I brought forward the same quote several times at Sandwalk in my comments on the evolution of genome size, C-value enigma, and the potential roles of the so called ‘junk DNA’ (jDNA). In his PNAS paper Doolittle also says that by developing “a larger theoretical framework, embracing informational and structural roles for DNA, neutral as well as adaptive causes of complexity, and selection as a multilevel phenomenon …much that we now call junk could then become functional” (emphasis added).
So, I would like to ask Mike and Josh: what do you think about Doolittle’s statement?
Also, what do you think about the nucleoskeletal and nucleotypic hypotheses on the biological roles of jDNA promoted by Thomas Cavalier-Smith and by Ryan Gregory, which have been discussed in dozen of scientific papers and books? Do they pass the ‘onion test’ in your opinion?
I’ll quote Doolittle’s next sentence, which applies to Mattick:
I’m on board with the idea of ‘constructive neutral evolution’ that Doolittle discusses, and which is similar to the pop-gen-informed genome evolution model Michael Lynch has long been arguing for.
I also think ‘junk’ is a vague and not so useful term.
And of course I think the onion test is consistent with Ryan Gregory’s answer to the C-value paradox, but let me ask you this: do you think Ryan Gregory believes that differences in genome size for various onion species evolved primarily because of selection for different, non-sequence-based ‘nucleoskeletal’ functional needs?
Here’s his justification for the test:
The burden of proof is on those who are claiming function.
Indeed, there are few theoretical or conceptual frameworks that can account for adaptive differential evolution of genome size in similar organisms, or highly related species such as onions, and none of them have been adopted by ENCODE or Mattick.
However, apparently, Doolittle built his vision for the prospect of debunking jDNA primarily on the nucleoskeletal and nucleotypic hypotheses.
Regarding your question, here are two relevant quotes from Ryan’s papers:
“Although some researchers continue to characterize much variation in genome size as a mere by-product of an intragenomic selfish DNA “free-for-all” there is increasing evidence for the primacy of selection in molding genome sizes via impacts on cell size and division rates” (Gregory TR, Hebert PD. 1999. The modulation of DNA content: proximate causes and ultimate consequences. Genome Res; 9(4):317-24).
“These are the “nucleoskeletal” and “nucleotypic” theories which, though differing substantially in their specifics, both describe genome size variation as the outcome of selection via the intermediate of cell size” (Gregory TR. 2004. Insertion-deletion biases and the evolution of genome size. Gene, 324:15-34).
What do you think?
“However, apparently, Doolittle built his vision for the prospect of debunking jDNA primarily on the nucleoskeletal and nucleotypic hypotheses.”
I can’t agree with that characterization of Doolittle’s paper – I didn’t see any ‘vision for the prospect of debunking jDNA’ in there; what I saw was one of the original authors of the selfish DNA hypothesis explaining how his fundamentally correct idea has been modified and placed in context – it’s not the whole story, but a big part of it.
As far as nucleoskeletal hypotheses go, you have to be careful about mixing up cause and consequence. Ryan Gregory describes the correlation:
But you can’t draw the conclusion from this correlation alone that large genomes were produced by selection pressure for say, slowly dividing cells – as Michael Lynch has shown, the causality can work the other way. Population genetic constraints can lead to neutral genome expansion, which can the in turn drive a slower cell division rate and a larger nucleus.
If ENCODE thinks they have something to add to this debate, then they should say it; but the reckless statements about genome function suggest that many are not even aware of it.
This made me realize that our commenting system lacks a “+1” or “Like” function. This is a weakness.
@josh … There’s a like button on everything these days’ commenting systems should definitely have it.
I should clarify Lynch’s model – it’s not that genome expansions are neutral; it’s that they are non-adaptive, and in the direction of mutational bias. These mutations can be deleterious, but weak enough to escape selection against them in small populations.
Also known as “effectively neutral”. Relative fitness only has any value in real organisms and in real populations. In this sense, something is “neutral” not because it’s fitness effect is precisely zero (at all levels of increasingly high resolution), but because selection is unable to overcome other non-selective forces.
I would agree that Doolitle did not write a full, seven pages article in PNAS to criticize ENCODE’s conclusion that, based on its data, 80% of the human genome is functional (most bloggers only needed a paragraph or two to showcase the flaws and the deceiving nature of ENCODE’s conclusion). And, although the Abstract of his paper is dedicated mostly to the ENCODE story, I think the last sentence embraces the authors main message:
“A larger theoretical framework, embracing informational and structural roles for DNA, neutral as well as adaptive causes of complexity, and selection as a multilevel phenomenon, is needed”
Moreover, in the conclusion section (“So Is Junk Bunk?) of the paper, after outlining four reasons why “we might want to come up with new definitions of function and junk,” Doolitle concludes that by building this larger theoretical framework: “Much that we now call junk could then become functional.”
In regard to the ‘nucleotypic function’ (not to be confused with the ‘nucleoskeletal function’ proposed by Cavalier-Smith) of jDNA developed by Ryan Gregory, who is justifiably described by Doolitle as “now the principal C-value theorist”, I think his statements (see the 2 quotes above from Ryan’s papers) about the nucleotypic hypothesis are as clear as they can be: “there is increasing evidence for the primacy of selection in molding genome sizes via impacts on cell size and division rates” (vs. as a by-product of selfish DNA), and nucleotypic hypothesis “describe genome size variation as the outcome of selection via the intermediate of cell size”.
Unfortunately, at least among bloggers, Ryans is known primarily for his powerful metaphor for the c-value enigma: the ‘onion test.’ I say unfortunately, because this has eclipsed his nucleotypic hypothesis on the evolution of gnome size, to which he has dedicated numerous scientific articles. What is not clear, however, is how does he reconcile his nucleotypic hypothesis with the onion test?
Mike, I concur with your alert about “mixing up cause and consequence”. As a matter of fact, almost a quarter century ago, in a very short paper about the evolution of genome size, c-value enigma, and a putative biological role for jDNA (also referred as ‘secondary DNA’), I specifically made this point:
“In addition to its hypothetical protective function, secondary DNA has probably influenced the evolution of cells and their genome. For instance, to accommodate secondary DNA, cells have changed their metabolism (e.g. nucleotides metabolisms) and structure (e.g. nuclear volume). Secondary DNA may have also influenced the rate of evolution by increasing genome fluidity (3)” (http://www.ncbi.nlm.nih.gov/pubmed/2156137).
I also agree with the points you and Josh made about the evolution of genome size and ‘neutral evolution’, but that’s for another discussion.
You seem to be suggesting that Ryan’s blog doesn’t represent his work, that his very explicit criticisms of Mattick on the blog don’t reflect what Ryan has written in his papers, and that Ryan’s onion test is inconsistent with his work on C-values.
If you want to contribute something constructive, instead of cherry-picking only your favorite bits from Doolittle or Gregory, why don’t you explain:
1) What specific evidence shows that variation in genome size within a genus is adaptive rather than non-adaptive? In other words, how do you show there has been selection for different ‘nucleotypic’ function in two members of a genus?
2) What is the point of the onion test if the difference between Allium species is driven by selection for ‘nucleotypic’ function?
First, let me comment on your statement about “cherry-picking only your favorite bits from Doolittle or Gregory”. When evaluating or discussing other people’s work or ideas, it is preferable to use their “words” rather than own interpretation, which can be wrong.
So, when you ask me: “do you think Ryan Gregory believes that differences in genome size for various onion species evolved primarily because of selection for different, non-sequence-based ‘nucleoskeletal’ functional needs?” (emphasis added), instead of posting my interpretation of his ideas, which might be wrong, I brought forward two quotes from his papers, in which he addressed the question about selection for genome size, so you can decide for yourself. In my interpretation, according to Ryan, the nucleotypic hypothesis postulates that the genome size variation is the outcome of selection via the intermediate of cell size. What’s your interpretation?
Regarding the relevance of quotes, let me bring up another example, this one addressing ENCODE’s conclusion that 80% of the human genome is functional. Just like you and many other people, I have criticized ENCODE’s presumed conclusion. But, if you ask me for a quote from their papers with their conclusion that 80% of the human genome is functional, I can’t give you one. Can you?
Very interesting, in his peer-reviewed paper in PNAS (http://www.ncbi.nlm.nih.gov/pubmed/23479647) criticizing ENCODE’s conclusion that 80% of the human genome is functional, apparently, even Doolitle could not come up with a quote from ENCODE paper outlining this presumed conclusion. Instead, in order to make his case against ENCODE, he starts his critique with 4 quotes from secondary articles written by science-news writers.
Back to your points, I’ll address them in the order you made them in your last comment:
(i) “You seem to be suggesting that Ryan’s blog doesn’t represent his work, that his very explicit criticisms of Mattick on the blog don’t reflect what Ryan has written in his papers, and that Ryan’s onion test is inconsistent with his work on C-values.”
I think Ryan’s blog does represent much of his work supporting the notion that the so called jDNA could not have informational functions. However, for whatever reason he chose not to showcase his nucleotypic hypothesis, which he has discussed in a dozen or so papers, and in which he attribute selected, non-informational functions to jDNA. I also think that Ryan’s explicit criticisms of Mattick’s hypothesis reflect what Ryan has written in his papers and I agree with him and with other people who have criticized Mattick’s hypothesis. Regarding his onion test, as a mentioned before, I don’t know how he reconciles it with his nucleotypic function.
In regard to your request for other explanations here they are:
“(1) What specific evidence shows that variation in genome size within a genus is adaptive rather than non-adaptive? In other words, how do you show there has been selection for different ‘nucleotypic’ function in two members of a genus?”
I don’t know how to show that there has been selection for different ‘nucleotypic’ functions in two members of a genus. I think we should ask Ryan this question.
(2) “What is the point of the onion test if the difference between Allium species is driven by selection for ‘nucleotypic’ function?”
I think there is a conflict, but gain we should ask Ryan for an answer as he is the promoter of the ‘onion test’ and the ‘nucelotypic function’.
Mike, I have tried to address you points as good as I could, so I would like to ask you opinion on one specific point: how do you interpret the following statement by Doolittle (http://www.ncbi.nlm.nih.gov/pubmed/23479647):
“It is nevertheless true that a distinction between structural and informational roles has long been part of the C-value argument for junk DNA. This line of reasoning has held that high C-value might be necessary for cellular function, but the nongenic DNA that fills the requirement is informationally junk. ENCODE’s claim is that much more of the DNA is, in fact, informational (especially regulatory) than we had thought, and indeed ENCODE’s focus is on sites likely to be involved directly or indirectly in transcription—on the “myriad elements that determine gene expression” to quote The Lancet (3). Therefore, the structure–information distinction informs the interpretation of the project’s results, and without it there would be nothing novel or newsworthy in the assertion that all of the human genome has some sort of role in human biology. We have known that since the mid-1980s (emphasis added).
I think he says that the high C-value of genomes, such as that of humans, might be necessary for cellular functions in some kind of non-informational roles, and that we have known that since mid-1980’s. What do you think, does he say that, or not? I know, it would be better to ask him but, from my experience, he does not answer questions or issues that challenge his perspectives.
The statement “all of the human genome has some sort of role in human biology” is not the same as a claim that high C-values are necessary for cellular function. Transposable elements have a role in human biology – often a detrimental one, but by the mere fact of their huge presence in the genome, and the necessity for a mechanism to suppress them, there is no question that TEs play a role in our biology. That does not mean they encode necessary informational function, or that they are there to bulk up the genome to promote slow cell cycle times or whatever. As Ryan Gregory has said, and as Doolittle explained in his lengthy section on ‘constructive neutral evolution’, non-functional does not mean inconsequential.
You think Ryan is being inconsistent. I maintain that you are selectively quoting his work.
The problem was not so much in their papers – I agree with Sean Eddy that the 80.4% number in the main paper was pro forma reporting of basic statistics that’s part of any genomics project. The problem was public statements made by ENCODE scientists, and in particular the specific, repeated claim that their results debunked junk DNA. Ryan has a compilation of quotes. The Nature New & Views was filled with statements like this one:
Check out Ryan’s compilation of quotes, and you find ENOCDE members making similar claims.
You might want to read Doolittle’s statement again as he specifically refers to the “line of reasoning has held that high C-value might be necessary for cellular function”. Also, apparently, Doolittle’s perspective on Ryan’s hypothesis is similar to my interpretation, as he writes:
Cavalier-Smith (13, 20) called DNA’s structural and cell biological roles “nucleoskeletal,” considering C-value to be optimized by organism-level natural selection (13, 20). Gregory, now the principal C-value theorist, embraces a more “pluralistic, hierarchical approach” to what he calls “nucleotypic” function (11, 12, 17).
I think that if you read Ryan’s papers on nucleotypic function of jDNA, you might arrive at a similar understanding. Unfortunately, some of Ryan’s dozen or so papers on the subject are not freely available, but if you ask him, I think he will forward them to you.
On another subject, you probably did not see my last comment and question on our discussion at Sandwalk (http://sandwalk.blogspot.com/2013/07/the-dark-matter-rises.html#comment-form) about your interesting paper “Finding function in the genome with a null hypothesis” (http://www.ncbi.nlm.nih.gov/pubmed/23818646):
“I still maintain that in order to *show* that “the cis-regulatory potential of TF-bound DNA is determined largely by highly local sequence features and not by genomic context” you need to place these elements at various sites in the genome. Maybe we are disagreeing about the meanings of the term *show* vs. *suggest*, which I think would have been more appropriate.
Nevertheless, I would guess that you and your colleagues have investigated the local sequence features of the TFs *bound* and *unbound* DNA elements in the genome. What do the results show?”
Correction: The title of Mike’s interesting PNAS paper is “Massively parallel in vivo enhancer assay reveals that highly local features determine the cis-regulatory function of ChIP-seq peaks”.
“Finding function in the genome with a null hypothesis” is the title of the post in which he discusses the paper (https://thefinchandpea.com/2013/07/17/using-a-null-hypothesis-to-find-function-in-the-genome/).
Again, you’re cherry picking the nucleotypic bits from the rest of the context of Doolittle’s paper, including his thought experiment. Both Doolittle and Gregory, when you read them in context, show that 1) the burden of proof is on those who argue that that jDNA is functional and 2) no single explanation by itself accounts for variation in jDNA content between species.
As far as my paper goes, you missed the point. It’s this: there are millions of recognition sites for any given transcription factor in the genome. Only a small fraction of these are actually bound. Why?
If it’s genomic context, then different bound and unbound genomic sequences with the same motif content should show the same function on a plasmid (removed from their genomic context). Our results show this prediction is false. The difference between bound and unbound regions of the genome is not context, but locally encoded information.
To continue my quote-ranting from Doolittle’s intriguing paper “Is junk DNA bunk? A critique of ENCODE” (http://www.ncbi.nlm.nih.gov/pubmed/23479647), here is how he ends it:
“In the end, of course, there is no experimentally ascertainable truth of these definitional matters other than the truth that many of the most heated arguments in biology are not about facts at all but rather about the words that we use to describe what we think the facts might be. However, that the debate is in the end about the meaning of words does not mean that there are not crucial differences in our understanding of the evolutionary process hidden beneath the rhetoric.”
It only seems appropriate that Doolittle finishes his paper with this ‘disclaimer’, as the paper itself is a remarkable example of skillful use of words that are likely to induce heated arguments for a long time to come.
Regarding your ‘cherry picking’ statement: how can I bring forward the ‘nucleotipic function’ of jDNA and the associated selection forces as promoted by Gregory and Doolittle without quoting from their papers? If you find a more appropriate, specific description of the ‘nucleotipic function’ from the many papers written about it, please bring it forward.
Like you and many other people interested in the evolution of genome size, including Gregory and Doolittle, I agree with the notion that (i) “the burden of proof is on those who argue that jDNA is functional”. And, it is likely that, as you suggested, they believe that (ii) “no single explanation by itself accounts for variation in jDNA content between species”. That explains Doolitles note about Gregory’s “nucleotypic function”, which apparently embraces a more “pluralistic, hierarchical approach” compared to the “nucleoskeletal function” proposed by Cavalier-Smith and, therefore, might better explain the evolution of genome size.
The problem with the ‘nucleotipic function’ is that very few people understand it, and I don’t think it passes the ‘onion test.’
(Parenthetically, I do like Doolittle’s “thought experiment”, as this kind of investigation is relatively inexpensive to perform and it can be very revealing).
Regarding your study, have you and your colleagues investigated the local sequence features of the TFs bound and unbound DNA elements in the genome. If not, why not? If yes, were the results similar to those of your study?