How many genes were we supposed to have?

In my Pacific Standard column this week, I note that over the course of the 20th century our concept of the gene went from being an abstract unit of heredity to an increasingly restrictive molecular definition. The advantage of this molecular definition is that it made genes countable; the drawback is that it is ill-suited to describe the heterogeneous collection of DNA elements that make up our genome. We’re now in the somewhat ironic situation where more function in our genome falls outside of these conventional genes. As other have noted before, the physical and ‘genetic’ definition of a gene are in tension.

Before we sequenced the human reference genome, how many genes did people expect us to have? Most estimates made in the 1990’s put the number between 60,000 and 100,000. One group in 1994 reviewed estimates out there in the literature, which ranged form 20,000 to 100,000, and ultimately favored a prediction in the 60,000-70,000 range. In 1998, Deloukas, et al. published a physical map of 30,000 human genes (PDF) and figured that they had captured nearly half of the complement of human genes. In 1999, Francis Collins was using a number of “80,000 or so.” My molecular cell biology textbook, the third edition of Lodish, et al. (2000) stated that our genomes were expected to contain 60,000-100,000 genes. One estimate, made less than a year before the draft genome sequence was published, noted that “Early estimates suggested that there might be 60,000−100,000 (ref. 1) human genes, but recent analyses of the available data from EST sequencing projects have estimated as few as 45,000 (ref. 2) or as many as 140,000 (ref. 3) distinct genes.” They worked out their own estimate of the total genes in the genome: “Using highly refined and tested algorithms for EST analysis, we have arrived at two independent estimates indicating the human genome contains approximately 120,000 genes.”

So many people were surprised to discover that we only have ~21,000 protein-coding genes and a slightly smaller number of RNA genes.

Larry Moran, who doesn’t like my piece, argues that all of these people had no idea what they were talking about, and we should therefore dismiss them. All the smart kids, who knew the literature of the 1970’s, were not surprised:

There may have been researchers who speculated about the number of genes in the human genome but surely the only estimates that count are those from scientists who were knowledgeable about the subject. … No great surprises there unless you count those people who made speculative guesses without knowing the data from the 60s and 70s.

So all of the estimates I linked to above are clearly mere speculative guesses made by people who had no business talking about gene numbers, while the estimates made two or three decades earlier clearly weren’t speculative at all… because of course back in the 60’s and 70’s, scientists never engaged in speculation on subjects they weren’t knowledgable about.


Lewin, Genes VI (1997) p. 710:

The total number of genes is likely to be >2000 for bacteria, >6000 for yeast, >9000 for insects, and >125,000 for mammals.

Author: Mike White

Genomes, Books, and Science Fiction

6 thoughts on “How many genes were we supposed to have?”

  1. “Nothing in evolution makes sense except in the light of population genetics.”
    -Michael Lynch

    Unfortunately many molecular biologists are totally ignorant of this field (population genetics/molecular evolution). That’s the point that Larry is making.

    1. I’m a big fan of Michael Lynch but I don’t buy Larry’s argument that the calculations made in the 70’s were more principled or less speculative than the work I cited above. There is no reason to expect a priori that a pop gen argument would get you a more accurate number than say, extrapolation from the incomplete sequencing data available in 2000.

  2. “There is no reason to expect a priori that a pop gen argument would get you a more accurate number than say, extrapolation from the incomplete sequencing data available in 2000.”

    In some ways that misses the point. An estimate of the number of genes in the human genome should at least be consistent with what we know from pop gen/molecular evolution. This includes mutation rates and genetic load. These principles were simply ignored by those making guesses.

    Scratch that last thought – my hunch is that they were ignorant that they were ignorant. They did not even know that their estimates should not conflict with these basic principles.

    1. Not all of the people making those estimates were ignorant of pop gen – go read their other papers. Besides, what’s the error on estimates of gene count using genetic load? I haven’t seen convincing evidence that the estimates made in the 1990’s were inconsistent with pop gen principles. The fact that some people decades earlier made different estimates based on theory and the data they had at the time does in no way indicate that the later estimates were inconsistent with the theory. They had more data.

      You and Larry are perpetuating the bullshit notion that the whole genomics community is ignorant of pop gen. That’s simply not true, nor is it true that you can deduce the empirical facts of molecular biology & genomics from pop gen. Eventually both fields should be consistent, but pop gen isn’t the answer to everything.

  3. Mike, the point is that pop gen sets an upper limit on how much DNA can be under selection pressure, and thus be functional. But besides that, there are other lines of evidence.

    1. And my point is that pop gen estimates from the 60’s weren’t precise enough, given the data available, to rule out the evidence from ESTs and other sources that we had 60k to 100k genes. Keep in mind that while the gene estimates varied greatly, the estimates of total DNA under selection did not.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: