In my Pacific Standard column this week, I note that over the course of the 20th century our concept of the gene went from being an abstract unit of heredity to an increasingly restrictive molecular definition. The advantage of this molecular definition is that it made genes countable; the drawback is that it is ill-suited to describe the heterogeneous collection of DNA elements that make up our genome. We’re now in the somewhat ironic situation where more function in our genome falls outside of these conventional genes. As other have noted before, the physical and ‘genetic’ definition of a gene are in tension.
Before we sequenced the human reference genome, how many genes did people expect us to have? Most estimates made in the 1990’s put the number between 60,000 and 100,000. One group in 1994 reviewed estimates out there in the literature, which ranged form 20,000 to 100,000, and ultimately favored a prediction in the 60,000-70,000 range. In 1998, Deloukas, et al. published a physical map of 30,000 human genes (PDF) and figured that they had captured nearly half of the complement of human genes. In 1999, Francis Collins was using a number of “80,000 or so.” My molecular cell biology textbook, the third edition of Lodish, et al. (2000) stated that our genomes were expected to contain 60,000-100,000 genes. One estimate, made less than a year before the draft genome sequence was published, noted that “Early estimates suggested that there might be 60,000−100,000 (ref. 1) human genes, but recent analyses of the available data from EST sequencing projects have estimated as few as 45,000 (ref. 2) or as many as 140,000 (ref. 3) distinct genes.” They worked out their own estimate of the total genes in the genome: “Using highly refined and tested algorithms for EST analysis, we have arrived at two independent estimates indicating the human genome contains approximately 120,000 genes.”
So many people were surprised to discover that we only have ~21,000 protein-coding genes and a slightly smaller number of RNA genes.
Larry Moran, who doesn’t like my piece, argues that all of these people had no idea what they were talking about, and we should therefore dismiss them. All the smart kids, who knew the literature of the 1970’s, were not surprised:
There may have been researchers who speculated about the number of genes in the human genome but surely the only estimates that count are those from scientists who were knowledgeable about the subject. … No great surprises there unless you count those people who made speculative guesses without knowing the data from the 60s and 70s.
So all of the estimates I linked to above are clearly mere speculative guesses made by people who had no business talking about gene numbers, while the estimates made two or three decades earlier clearly weren’t speculative at all… because of course back in the 60’s and 70’s, scientists never engaged in speculation on subjects they weren’t knowledgable about.
Lewin, Genes VI (1997) p. 710:
The total number of genes is likely to be >2000 for bacteria, >6000 for yeast, >9000 for insects, and >125,000 for mammals.