## Friday, September 16, 2011

### Statistical Prediction

With this post, I'm going to start taking a look at predicting game outcomes based upon team-level statistical measures other than won-loss or MOV, i.e., measures like "team scoring average," "average number of offensive rebounds per game," etc.

There are a number of ways to slice & dice these statistics, but the most straightforward approach is to use season-to-date averages.  So, when I'm trying to predict the Illinois-Purdue game on 2/15, I'll be looking at the statistics for those two teams averaged over all the games for that season before 2/15.  And I also want to include average statistics for a team's opponents.  So I want to know both Purdue's scoring average for all of its previous games, and also the scoring average of its opponents in those games.  For every game, I'll typically have four values for a statistic: the home team's average, the home team's opponents' average, the away team's average, and the away team's opponents' average.

To begin with, let's look at how well we can predict games using the most obvious statistic: the scoring average.  Using just the (four) scoring average statistics, and the usual methodology, here's our performance:

Predictor    % Correct    MOV Error
Govan + Averaging73.5%10.80
Scoring averages72.1%11.18

That's pretty encouraging.   Just using the scoring averages delivers performance comparable with some of our better W-L and MOV-based predictors.  The bad news is that this is still highly correlated with our best other predictors (around 96%), meaning that it probably can't be used in an ensemble to improve our overall predictive performance.

If we look at adding other statistics we find (as would be expected from the literature) that they offer little improvement.  The best combination I could find (in order of importance) was (1) scoring, (2) 3 pt percentage, and (3) opponent's average offensive rebounding:

Predictor    % Correct    MOV Error
Govan + Averaging73.5%10.80
Scoring averages72.1%11.18
Scoring + 3 pt % + Opponent's off rebounding 72.2%11.09

As you can see, the improvement was not huge.  The inclusion of "average number of offensive rebounds by opponents" is interesting because it is not scoring-related.  That statistic would seem to capture some aspect of a team's defensive performance -- a team that gives up a lot of offensive rebounds to its opponents is probably doing something wrong at the defensive end of the court.  That suggests that we might want to think about a better measure of defensive performance -- for example, we might want to look at offensive rebounding percentage rather than just the raw total.