It is conventional wisdom that every sensible bracket includes one and only one #12 seed upset over the #5 seed. Is getting the #12 seed the basketball equivalent of a +8 sword with double damage against the undead? If we look at the historical frequency of upsets in the round of 64 compared to the difference in seed[1], we see that the probability of upset decreases linearly as the difference in seed between the two teams increases (r2=0.96).
Each year, we get to observe 4 games of each seed match-up. The #12 seed historically upsets the #5 at a probability of 32.1% per game. Therefore, you are likely to see at least one #12 upset a #5 (1.29 upsets per tournament) and two in one year should happen every 2.41 tournaments. Curiously, the #12 seed enjoys more success against the #5 than the #11 seed does against the #6 seed (29.8%).
We also expect that the #8 and #9 seeds to split their four games evenly. A #9 beating a #8 is hardly considered an upset. In fact, the #9 seed beats the #8 seed 54.8% of the time, which highlights how hard the tournament is to seed, especially at the margins.
This year, the tournament featured two #15 seeds (Lehigh and Norfolk State) beating two #2 seeds (Duke and Missouri, respectively). In a “typical” year, a #15 has a 4.8% chance of upsetting a #2. This means that in each tournament we have a 19.1% chance of seeing a #15 upset a #2. We should see a #15 upset a #2 about once every five years. This is the first time two such upsets have occurred in one tournament and they really should happen only once every 110 tournaments[2].
The linear relationship between seed differential and upset probability suggests that the NCAA tournament committee does a good job of seeding the tournament. At the extremes, they do such a good job that our best fit line predicts that a #16 over #1 seed upset will never happen (frequency = -3.6%). I can’t wait to be wrong on that one, especially if the Tar Heels are the #1 seed[3].
I wrote the original version of this back in 2009. For this year, I’m breaking it into more digestible chunks, with a lot of rewriting and up to date statistics, when possible.
NOTES
- Data are from “Seed-by-seed matchups in NCAA Tournament” by Peter Tiernan. The data are only through the 2005 tournament to avoid bias due to this year’s historic upsets.
- The field of 64 only dates back to 1985. This gives us 27 previous tournaments to work with, which may not be enough samples to accurately measure the probability of low frequency events like #15-#2 upsets.
- They will never live it down. Never.