Tuesday, May 10, 2011

Wilson Rating

(Note:  My original implementation of the Wilson rating had an error.  See "Whoops!"

The next RPI alternative we'll look at comes from David Wilson.  Wilson's rating system was developed for use with college football, but applies equally well to college basketball.  Wilson describes this system this way:
A team's rating is the average of its opponents' ratings plus 100  for a win or minus 100 for a loss.  Wins that lower the rating and losses that raise the rating count one twentieth as much as the other games.  Post-season games count double.
A comparison of this system to the Iterative Strength Rating shows one major difference.  In the Wilson Rating, wins that lower a team's rating and losses that raise a team's rating are heavily discounted.  Other than this, the only differences between the systems are the initial values and the size of the win/loss bonus.  Both ratings are calculated with an iterative algorithm to hone in on the final values.

Implementing the standard Wilson algorithm and testing it with our usual methodology gives this result:

  Predictor    % Correct    MOV Error  

This shows a slight improvement in MOV Error over the ISR rating.

There are only a few tweaks we can easily apply to the Wilson Rating.  One is to use our MOV cutoff to exclude close games from the ratings.  This didn't improve ISR, so it likely won't improve Wilson either:

  Predictor    % Correct    MOV Error  
Wilson (mov=1)77.6%10.34
Wilson (mov=4)76.9%10.35

And indeed it doesn't.  Another possibility is to tweak the amount that some games are discounted.  Wilson himself initially set the discount to .01 and later raised it to .05.  If the discount is set to 1.0, Wilson is functionally equivalent to ISR; if it is set to 0.0 it maximally discounts those games, so we can try those two limit cases to see how performance is affected:

  Predictor    % Correct    MOV Error  
Wilson (weight=1.0)77.6%10.35
Wilson (weight=0.0)77.6%10.34

Interestingly, the game discounting doesn't seem to have much of an impact.  There is a performance improvement that maximizes around 0.15 but the improvement is not substantial.

Except for the game discounting, the primary difference between the Wilson rating and the ISR is the size of the bonus given for a win or a loss.  My implementation of ISR uses a bonus of 4, and the standard Wilson algorithm uses 100, so we can try a range of values to see where performance is maximized:

  Predictor    % Correct    MOV Error  
ISR (bonus=4)77.7%10.45
Wilson (bonus=100)77.7%10.33
Wilson (bonus=25)77.5%10.42
Wilson (bonus=250)77.7%10.33

Performance appears to maximize for values >100.  This is somewhat counter-intuitive -- one would imagine that with an iterative algorithm like ISR/Wilson, the size of the win bonus would only affect the time to reach a steady-state solution, but apparently it can affect (at least marginally) the accuracy of the solution as well.

So the Wilson Rating with the bonus set at 100 and game discounting at 0.15 is our new champion RPI-like rating system.

No comments:

Post a Comment