Tuesday, August 16, 2011

Possessions, Continued

A few more experiments and thoughts about predicting the number of possessions in a game.

Following up on my previous post, I took a look at some variants of the model used there to predict possessions.  To start with, I looked at a variant model where one team had more control over the pace of the game:
Possessionspredicted = Alpha * Preferred PossHome + (1 - Alpha) * Preferred PossAway
The idea being that perhaps the home team has more control over the pace of the game -- analogous to the home court scoring advantage.   Or perhaps the away team has the advantage. However, the results didn't indicate an advantage for either team.  Prediction performance went down for any value of Alpha significantly different from 0.50.

The next experiment was to use a different model altogether for predicted possessions:
Possessionspredicted = FHome * FAway
There's no intuitive explanation for this model -- it just presumes that there's a multiplicative relationship between two underlying factors.  But it's quite different from our intuitive model (that the two teams essentially split the difference between their desired paces), so at a minimum, if this model worked poorly, it would be evidence that our intuitive model has some validity.

But interestingly enough, this model did just about as well as the split model:

  Predictor    Error  
Possessions (Split Model)5.20
Possessions (Multiplicative Model) 5.22

 which suggests to me that the intuitive approach may not be particularly valid.

So I took a step back and tried to characterize the range of performance for predicting possessions.  To start with, I created a predictor that simply predicts the average for the test data (~67):

  Predictor    Error  
Possessions (67)6.30
Possessions (Split Model)5.20
Possessions (Multiplicative Model) 5.22

This showed that the split model is only about 1 possession per game more accurate than just guessing the average.  As a second experiment, I ran a linear regression using all the teams as the attributes -- this generates a huge regression with 592 terms (essentially a Home term and an Away term for every team in Division I) with the following performance:

  Predictor    Error  
Possessions (67)6.30
Possessions (Split Model)5.20
Possessions (Multiplicative Model) 5.22
Possessions (Regression on All Teams) 4.72

I wouldn't expect much out of this model, but it does about a 1/2 a possession better than the Split Model.  (It should be noted that this is not an apples-to-apples comparison to the other models; this simple regression uses the entire season for data, not just the season up to the predicted game.)  So I think there's clearly some work to be done in improving our model for predicting possessions.

2 comments:

  1. This is a tough problem. It seems like you would ideally want to have "time per possession" data for each offense - some are run and gun, while others are slow and deliberate. A good defense can probably influence its opponents time per possession, but I would think that in most cases, the offense has more control.

    But even if you had that data, time per possession would get influenced by whether a team is playing from behind or with a lead.

    Perhaps you could try this regression: for each game, use the average possessions per game for the home and away teams as independent variables and the observed possessions for their meeting as the dependent variable. You could limit the number of games used to calc the avg possessions per game to the previous 5 games, or whatever works best. And you may want to adjust/normalize those observations through some iterative method as well prior to the regression.

    ReplyDelete
  2. PP, I'll give your suggested regression a try and see what (if anything) it tells us.

    I don't think a team has much affect on its opponent's pace of play, but it can (to some extent) control its own pace of play. That's why my model splits the pace of play evenly between the two teams -- they each control their own pace of play, and the average is what ends up as the pace of play for the game.

    ReplyDelete