## Wednesday, April 13, 2011

### As Good As We Can Do

The 1-Bit Predictor gives us a useful lower bound for prediction performance.  Let's turn now to the other end: What's the theoretical "best performance" we can hope to achieve?  Comparing RPI, Massey, Sagarin and LMRC predictions over six seasons of tournament games, [Sokol 2006] found performances in the 70-75% range.  How much can we hope to improve that number?

We can think of a college basketball game as having both a deterministic and a random component.  If the random component is zero, then there would be no variability in outcome -- every time two teams matched up (all other things being equal) the same result would occur.  If the deterministic component is zero, then results would be completely random.  Reality obviously lies somewhere in-between those two extremes.

By definition, there's no way to predict the random component of the outcome.  If we assume that we can predict the deterministic component perfectly, our performance is then limited by the magnitude of the random component.  So what is the magnitude of the random component?

There are a couple of different ways to explore this question.  One thought experiment is to imagine a game in which there is no random component except in the the last possession of the game, which is completely random.  Intuitively, that seems much less random than reality, so it provides a lower bound on estimating the magnitude of the random component.  So how would that affect the final outcome?

On the last possession, the team with the ball can score 2 or 3 points (or even 4 points), or might turn the ball over leading to the other team scoring -- a potential swing of 6 or more points.  So in this case, if we could predict the deterministic component of the game perfectly, we'd still have an average error of 3+ points.

Of course, in reality the last possession isn't entirely random.  But more importantly, the first 120 possessions aren't entirely deterministic!  This suggests that the best performance we can hope for is going to be significantly worse than +/- 3 points.

Another method to gain insight into this question is to look at repeat matchups of teams. Home and home conference matchups along with conference tournament matchups provide a lot of data that can be used to estimate the variability in college basketball games.  For example, in the course of a month in 2011, Duke and UNC played home-and-home and an ACC tournament game with the following results (margin from Duke's perspective):

Result
@Duke        +6
@UNC     -14
@Neutral     +17

This shows an enormous amount of variability.  Of course, there's a systemic bias in these numbers -- the home court advantage.  Sagarin estimates that at about 4 points for 2011.  If we factor that out, the results are:

Result
@Duke         +2
@UNC      -10
@Neutral      +17

which still suggests double-digit variability in game outcomes. Looking at all the home-and-home matchups for a season, [Sokol 2006] found that a team had to win by 21 points at home to have an even chance to win on the road.  Part of that margin is due to home court advantage, but since most estimates of HCA are in the 4 point range, the rest of the margin is probably required to "overcome" significant variability.

These sorts of analysis suggest that the random component in game outcomes is at least +/- 8 points.  So what does that say about trying to predict the outcomes of college basketball games?

Looking at 185 tournament games from 2009 & 2010 (both NCAA and NIT), the average margin of victory was about 11 points.  40% of the games were decided by 8 points or less.  If we look at just our first performance metric (picking the correct winnger), we need only get the outcome correct (not the final margin).  A predictor that accounts perfectly for everything except (say) 8 points of variability would get 60% of the predictions correct along with some portion (say 65%) of the remaining 40% -- for a final performance of ~85%.

In reality, of course, our predictor won't be perfect on the deterministic component of games, either.  Taken all together, this suggests that a realistic upper limit for picking the correct winner of a game is in the 70-80% range.  Since [Sokol 2006] showed that RPI and other schemes are already predicting in the lower part of this range, our progress is likely to be very incremental.  Improvements of 1% will be significant progress!

On our other performance metric (predicting the MOV) the story may be better, but that's an analysis for another day.