Thursday, March 13, 2014

Flattening & the Kaggle Contest

Jeff Fogle at Stat Intelligence has another good post up, this time arguing that in the post-season (the NCAA Tournament in the case of college basketball), differences between teams condense.  By his argument, once the best teams are playing each other, they're more able to reduce any differences in strength between the teams.  And one implication of this is that handicapping (prediction) that works for the regular season won't work as well for the post-season.

Putting aside for the moment whether I buy this, how would we test this idea?

One of the big problems with any sort of assertion about the post-season is that it's very difficult to test, simply because of the small sample size.  College basketball is actually the best candidate amongst the major sports, because you have 67 games a year in the post-season -- and arguably more if you're willing to include the NIT and the conference tournaments.  In contrast, the NFL has only 11 games a year in the post-season.

But even for college basketball, 67 games per year is just not that big a sample size.  With five years of data, you still have less than 350 games for testing purposes.  (And give how the rules change in college basketball, going back more than 5 years or so runs the risk of comparing apples to oranges.)  And if we're looking specifically at Jeff Fogle's hypothesis about the best teams playing each other, it isn't entirely clear that some of these games would count -- is a #1 playing a #16 more like a regular season game, or more like a playoff game?

I'm not going to work out the math, but with 350 games in our test set and the known high variance in college basketball, any difference you found between Tournament games and regular season games would have to be huge to significant.

Several other factors make it difficult to assess the difference between the Tournament and the regular season.

One is that the Tournament games are played on neutral courts with mixed officiating crews.  That might well be the cause for any difference we saw between regular season and Tournament games.

Another is that (by necessity) we have to try to predict Tournament games based upon regular season performance.  That will make it more difficult to discern any qualitative difference between regular season and Tournament games.

All that said, in my own experience I haven't identified a qualitative difference between regular season games and Tournament games.  (Or, for that matter, between conference and non-conference games.)  Specifically, if I build a predictor based upon regular season games and a predictor based upon only Tournament games, I find that the regular season version is still the better predictor of Tournament games.  But given the small sample size for building the predictor based on Tournament games, I don't place a lot of confidence in that result.

(I will caveat that preceding paragraph slightly:  Tournament games have different home court advantage numbers in my predictor, but I ascribe that difference to the fact that they're played on a neutral court.)

Interestingly, the Kaggle competition will provide something of an empirical test of this thesis.  Judging by the Phase 1 leaderboard, there are a number of competitors who are specializing their predictors for good performance on the past five Tournaments.  If these predictors generally out-perform the predictors that are optimized for all games (or for regular-season games) it could be taken as some level of evidence that there really are fundamental differences that a predictor can exploit.  (Or not; again, small sample size.)  But at any rate I'm quite interested in seeing the results.

No comments:

Post a Comment