Friday, February 10, 2012

The Continued (Slow) Pursuit of Statistical Prediction (Part III)

As promised last time, we'll now look at a different type of derived statistic. We're going to look at statistics which are the ratio between the two teams of the same base statistic, e.g.,

(Ave # of offensive rebounds for the home team / Ave # of offensive rebounds for the away team)

The idea here is that it may be more predictive to look at the relative strengths of the teams rather than the absolute strengths. 

The first statistics I want to try this upon are the strength measures like TrueSkill and RPI. Suppose that Syracuse, with an RPI of 0.6823, plays Missouri, with an RPI of 0.6234, and the same night UCF, with an RPI of 0.5723 plays Oregon State with an RPI of 0.516.  Would we expect the same outcome in those games?  In both cases, the better team is about 0.06 better in RPI.  But Syracuse is about 10% better than Missouri, while UCF is about 12% better than OSU.  If it's the relative strength that matters, we would expect UCF to win (on average) by more than Syracuse.

To test this out, I generated the relative strengths for measures like TrueSkill and ran them through my testing setup.  In every case, the relative strengths had no predictive value above and beyond the value of the absolute strengths.  And when the relative strengths alone were used for prediction, they underperformed the absolutes used alone.

I then did the same thing for the statistical attributes like offensive rebounding and got the same result.  The relative strengths of the two teams provided no additional predictive accuracy.

I find this result fairly intriguing.  My strong intuition was that at least a portion of the game outcome would be better explained by the relative strengths of the two teams. It's hard to believe that Syracuse should win its game against Missouri by more points simply because they're both stronger teams than UCF and OSU.  But (as has often proven to be the case!) my intuition was just wrong, and relative strength is much less important than I would guess.

No comments:

Post a Comment