Back in May, the St. Louis Post-Dispatch held a mother-daughter look alike contest. In their write-up of the results, they turned to a geneticist, Barak Cohen, for some expert commentary on why daughters look like their mothers:
We asked Dr. Barak Cohen, professor of genetics at Washington University Medical School, to explain this phenomenon.
“They are just the ones, who in a sense of the word, won the genetic lottery,” he said. In these cases, most of the mother’s genes are dominant.
(Barak tells me this quote was the outcome of a 30 minute conversation.)
The real truth is, we still don’t understand why children look like their parents, or rather, we don’t understand how DNA builds complex traits. Over at Pacific Standard this week, I discuss the case of the missing heritability and recent evidence that genetic variants with small effects might be a big deal. Go check it out. (And please don’t come back and talk to me about epigenetics.)*
A few more (largely personal) thoughts on genetic variation below the fold:
The concept of genetic variation blew my mind away when I started my postdoc. Those of you who’ve lived and breathed genetics since the start of your scientific training may not understand how utterly absent the idea of genetic variation is from the world of molecular cell biology (or at least it was before next-gen sequencing).
I came to biology via chemistry & physics (and music too), and did my PhD in a hard core biochemistry and biophysics department. I certainly knew that differences in our DNA made us different from each other, but for me the question of the relationship between genotype and phenotype was addressed by knocking out genes to see which ones were essential for a particular pathway.
When I came to Barak Cohen’s lab and encountered this work, I had my genetic variation epiphany: understanding the relationship between genotype and phenotype is not so much about knowing what genes are essential to a pathway (that’s molecular biology), it’s about knowing which genes vary in a population and cause variation in a phenotype.
I think geneticists underestimate how many biologists have never, ever thought about this question. It’s clear many have’t thought of the issue, because a common question our lab used to get is ‘if you want to know what genes are involved in sporulation, why don’t you just do a knock-out screen?’ But the point of the project wasn’t to explain sporulation – it was to explain variation in sporulation.
This was largely before GWAS, so the situation has probably improved – more biologists are aware of questions about variation. But until you start thinking about it, and start reading the navel-gazing and head-scratching prompted by the less-than exciting GWAS results, it’s easy to miss the fact that the question of why children resemble their parents is non-trivial, unanswered, and raises some profound issues that geneticists have actively struggled with since Darwin highlighted how little scientists knew about heredity.
*OK, we can talk about epigenetics only if you can explain to me, with at least some sort of model, how epigenetics could account for the so-far unexplained additive genetic variance that goes into the narrow-sense heritability calculations that are the basis of the case of the missing heritability.
Image: American Gothic (1930), Grant Wood, Art Institute of Chicago
[they] turned to a geneticist (not their)
Thanks – I suck at copy editing.
Grammar rules are like stars – via Emily Horne & Joey Comeau
http://www.asofterworld.com/index.php?id=1003
After thinking about this all day, I’m starting to think that we do know “why” kids look like their parents. Fisher pretty much figured this out. Kids are more likely to have the same genetic variants as their parents than random children are due to the mechanics of reproduction. What we don’t know is “how” kids look like their parents.
Very true, but I’ve always been a bit fuzzy on the distinction between ‘why’ and ‘how’ explanations, except that ‘why’ usually sounds more provocative (e.g., why is the sky blue?).
Following up on your point – there was a paper, can’t find the ref now, showing exactly that – much of the missing heritability for height vanishes if you take into account all 300k genotyped SNPs in the study, rather than limiting yourself to significant associations.
I’ll have to think about that more, but my gut reaction is that with 300K SNPs you are bound to explain most of the heritability one way or another, just based on the way the associations are calculated. There have also been a few papers arguing that imperfect linkage between the genotyped SNPs and rare variants of large effect could explain missing heritability also.
Here’s the ref: Common SNPs explain a large proportion of the heritability for human height. They conclude “most of the heritability is not missing but has not previously been detected because the individual effects are too small to pass stringent significance tests.”
And then there’s this: Synthetic Associations Created by Rare Variants Do Not Explain Most GWAS Results.
But I think the best answer seems to be that different traits will be different – we need to be careful about generalizations with out many detailed case studies:
“In general, the understanding of the causes of the genetic variation affecting any trait, in any population, be it human or yeast, must come from understanding the population genetic history. Mutation, genetic drift and selection in the past have combined to create the standing genetic variation and the genetic architecture of traits.”
(Quantitative Genetics: Heritability Is Not Always Missing)
There is another paper on the rare variant/synthetic association issue from a different group. There are some issues with the Wray paper in terms of being definitive on this issue. First, they are essentially saying that rare variants of not tiny, but not huge effect would probably go undetected in terms of a SNP peak. Second, they do not address a moderate position where a few associations are due to rare, causal variants in a pool of many common, causal variants.
They are mainly showing that the Dickson et al. argument that most GWAS SNP peaks are not likely to be marking loci with multiple, rare variants.
The door still seems to be open for some GWAS peaks being markers for rare variants of large effect (these are probably, but not exclusively ones that don’t replicate between populations), some rare variants of moderate effect going undetected, and a lot of non-additive effects.
Like you said, generalizations are, at this time, probably inappropriate.
I do like the call for thinking about evolutionary models in the paper linked to above, and in the Chabris et al paper, whether one agrees with the particular models invoked or not. If we do make generalizations, they’ll likely be made in an evolutionary/pop gen. history context.
I will note, from the abstract of the Yang et al. paper, that they are essentially making the same argument I have just made, but with semantic differences around definitions of rare & common:
“We provide evidence that the remaining heritability is due to incomplete linkage disequilibrium between causal variants and genotyped SNPs, exacerbated by causal variants having lower minor allele frequency than the SNPs explored to date.”
I just finished reading Yang et al. Nature Genetics paper, but I am somewhat uncomfortable to accept their result without doing the calculation myself. Have you guys tried it and know for sure that everything is done right?
I have not done it and the inclusion of ~300K non-significant SNPs worries me. This is the same concern I have any time a result relies on redefining statistical significance or dismissing it because it is so darn strict. Lighten up man.
Given the modeling done in a GWAS, I’m more comfortable interpreting their result as a limit to how much variance the GWAS results could possibly explain, not how much it is explaining.
Those concerns do not call into question the technical observation that genetic variants that are rarer than the genotyped SNPs with moderate effects and not tightly linked to a genotyped SNP will likely be undetected by a GWAS.
Well, they start with “the heritability of height has been estimated to be ~0.8”. I do not know whether even that is correctly done. Then they build a model to explain that 80%. Unlike your comment about 40% being upper limit, to them it seems like 40% is the lower limit and they go ahead to invent reasons for the remaining 40% !!
The details do matter quite a bit here. Under certain modeling conditions, with so many variables, you are going to explain almost all the variance possible in your data. Given 300K variables, it is pretty easy to fit a model well to the data. Assuming the values calculated are “true”, especially non-significant ones, is problematic for me.
My first concern is that the reasoning is circular.
Let me try to understand this from a physicist point of view. Let us say you have an equation (4* x^2+ 4* y^2=10). When you run it, you get a circle in the (x,y) plane. If you change the number 10 with 11, you still get a circle, or the same shape. If you change the first 4 by 5, you get something close to a circle. If you manage to change the equation to 4*x+4*y=10, you have a straight-line, which is a distinctly different shape.
Given that our shape is controlled by a large number of biochemical equations, whose coefficients are controlled by nucleotide sequences, the shapes of the children be similar to the parents, isn’t it? Of course, nature throws in quite a bit of perturbation to change the equations, and they are never the same between two persons.
When it comes to two unrelated persons, with 400,000 nobs to tune, we can go from anywhere to anywhere in terms of shape, given that we have no clue what the underlying equations are even in approximate form.
How do I understand what you are saying within the above framework?
Within the physics equation context, we know that nonlinear systems can be very unstable (butterfly effect). What that means is that a small perturbation can produce a small effect, but another small perturbation in the same direction can produce a huge effect. GWAS, on the other hand, uses some kind of linear model, if I understand the math correctly. How can we model a nonlinear system with linear model and expect everything to be explained?