## Thursday, November 8, 2012

### How to Pick a Tournament Bracket, Part 2

In the previous post, I looked at a strategy for picking a Tournament bracket.  The basic idea is that to win a sizable Tournament challenge, you can't just pick the most likely outcome of each game.  You're going to have to pick at least some of the inevitable upsets correctly.  A reasonable way to do that is to decide how many points from upsets you think you'll need, and then pick some combination of upsets to reach that number.  It turns out the best way to do that is to pick late-round upsets between closely-matched teams.

However, there are some concerns with that approach.  One is that if you pick "likely" upsets (such as a #2 over a #1), it's reasonable to assume that many of your competitors might pick the same upset.  So although the upset might be both likely and high-scoring, it might not do much to separate you from your competitors.  That's an interesting problem, but one we'll leave for another day.

Another concern is that the strategy is "all or nothing."  We are assuming that we'll need (say) G = 16 points to win the Tournament challenge and make picks accordingly.  But in truth our chance of winning the Tournament challenge is more of an S-curve:

We have some guess at G that will give us a reasonable chance to win the challenge, but we might end up needing more or we might be able to win with less.  With G = 16 the strategy I've outlined so far leads us to pick a single 16 point upset in a semi-final game.  This is fine if the upset occurs.  But if it doesn't, losing 16 points moves us a sizable distance to the left on the S-curve and greatly reduces our chances of winning.  Something of this sort happened to my entry (the Pain Machine) in the last Machine Madness contest -- the PM predicted a Kansas-Kentucky upset that would have left it at G = 34.  But since Kentucky won, the PM ended up at G = 2 and lost the contest to a predictor at G = 7.  We'd really like a strategy that optimizes our chance to win under all possible scenarios.

Mathematically, this is the sum over all possible outcomes of the likelihood of the outcome times the likelihood of winning under that outcome.  If we pick a single 16 point upset, then there are two possible outcomes: the upset happens or it doesn't.  If L(n) is the likelihood of winning the challenge with n points, then expected value of that strategy is:
EV = L(0) * (1 - p(u,v)) + L(16) * p(u,v)

But if instead we picked two 8 point upsets, then there are four possible outcomes: neither of the upsets occur, the first upset occurs but the second doesn't, the first doesn't but the second does, or they both occur.  The expected value of this strategy is more complicated:
EV = L(0) * (1 - p(u1,v1)) * (1 - p(u2,v2))
L(8) * p(u1,v1) *  (1 - p(u2,v2))
L(8) * (1 - p(u1,v1)) * p(u2,v2) +
L(16) * p(u1,v1) * p(u2,v2)
Depending upon the probability of the various outcomes and the likelihood of winning, the expected value of this strategy might be higher than picking a single 16 point upset, even though the chances of scoring the full 16 points are reduced.

Up until now, I've been implicitly assuming that the possible outcomes of an upset pick are either zero or n points.  But that's not really true.  The cost of an incorrect pick can be greater than just losing the points for that game.

For example, last year the Pain Machine correctly predicted that #15 Lehigh would beat #2 Duke.  Given the rarity of 15-2 upsets, that was an amazing prediction.  But even if it was a very likely upset, it would have been a bad pick, because there was a potentially high cost if the upset didn't happen.  To see why, here is the bracket:

If the prediction is correct, the Pain Machine picks up 1 point for the correct first round prediction.  But if the prediction is incorrect, the Pain Machine is very likely to lose 2 points when Duke wins in round, 4 more points when they win in the third round, and so on. (Since as a #2 seed, we expect them to win until the round of eight.)

We can generalize this idea as a value formula for a win by U over V:

V(u,v) = (p(u,v) * roundi ) - (p(v,i+1) * roundi+1 ) - (p(v,i+1)*p(v,i+2)* roundi+2 ) ...

Here, p(u,v) represents the probability that U defeats V, roundi represents the scoring value for the Ith round of the tournament, and p(v,i+1) represents the probability of V defeating their likely opponent in  round i+1 if they had not been upset by U.  To return to the Lehigh-Duke example, the value is the probability of Lehigh beating Duke times 1 (the value of that round) minus the probability of Duke beating Notre Dame (their expected opponent) times 2 (the value of that round), and so on.

To maximize V(u,v) we must maximize p(u,v) and minimize p(v,i+1).  And since roundi+1 = 2*roundi, it is twice as important to minimize p(v,i+1).  To translate this into plain English, we want to pick upsets where the team being upset has very little chance to win its next round game.  That's why the Lehigh upset was a poor pick -- because as a #2 seed Duke had a very good chance to win its second round game.

Instead, this formula will value upset picks like #10 Xavier over #7 Notre Dame.  (Which also happened!) To see why, look again at the bracket:
Whichever team wins the first round game -- Notre Dame or Xavier -- they are likely to lose the second round game to the stronger Duke team.  Thus the downside of the upset pick is minimized -- if Notre Dame wins and then loses to Duke as expected, you'll only have lost one point for the incorrect upset pick.

This insight is nothing new.  Canny pickers already look for upsets that are "firewalled" off in the next round by a strong opponent.  However, the value formula above gives us an objective measure for comparing between possible upset picks.  I suspect that most people incorrectly assess the p(u,v)  vs. p(v,?) tradeoff.  Because scoring doubles each round, it's much more important to consider the cutoff in the next round than the upset chance -- which most people probably find counter-intuitive.

Unfortunately, the strategy of "firewalling" upset picks runs counter to the strategy of picking high-scoring late round upsets, because (assuming most games are not upsets) the mismatches which make good firewalls primarily occur in the early rounds.  If the tournament runs mostly true to the seedings, the late round games are usually between closely-matched teams and do not make good firewalls.  An interesting exception is the Championship game itself.  If you pick an upset in the Championship game incorrectly, you're guaranteed not to lose any additional points.

To summarize these thoughts about picking a tournament bracket:
1. A bracket consisting of chalk picks and true mis-seedings is not likely to win a sizable Tournament challenge.
2. Picking late-round upsets between highly-seeded teams has the advantages of (1) scoring a lot of points, and (2) being relatively likely to occur.
3. To maximize the overall chance of winning the challenge, it may be better to spread your upset picks rather than bet "all or nothing."
4. Upset picks which are firewalled in the next round reduce the downside risk of an incorrect pick.