In my Pacific Standard column this week, I note that over the course of the 20th century our concept of the gene went from being an abstract unit of heredity to an increasingly restrictive molecular definition. The advantage of this molecular definition is that it made genes countable; the drawback is that it is ill-suited to describe the heterogeneous collection of DNA elements that make up our genome. We’re now in the somewhat ironic situation where more function in our genome falls outside of these conventional genes. As other have noted before, the physical and ‘genetic’ definition of a gene are in tension.
Before we sequenced the human reference genome, how many genes did people expect us to have? Most estimates made in the 1990’s put the number between 60,000 and 100,000. One group in 1994 reviewed estimates out there in the literature, which ranged form 20,000 to 100,000, and ultimately favored a prediction in the 60,000-70,000 range. In 1998, Deloukas, et al. published a physical map of 30,000 human genes (PDF) and figured that they had captured nearly half of the complement of human genes. In 1999, Francis Collins was using a number of “80,000 or so.” My molecular cell biology textbook, the third edition of Lodish, et al. (2000) stated that our genomes were expected to contain 60,000-100,000 genes. One estimate, made less than a year before the draft genome sequence was published, noted that “Early estimates suggested that there might be 60,000−100,000 (ref. 1) human genes, but recent analyses of the available data from EST sequencing projects have estimated as few as 45,000 (ref. 2) or as many as 140,000 (ref. 3) distinct genes.” They worked out their own estimate of the total genes in the genome: “Using highly refined and tested algorithms for EST analysis, we have arrived at two independent estimates indicating the human genome contains approximately 120,000 genes.” Continue reading “How many genes were we supposed to have?”