The version of ELO that I have implemented for testing (based upon the explanation given here) is defined by this formula:

N(t) = R(t)+K*(S(t)-E(t))Note that (because it does not base a team's rating recursively on the ratings of any other team) ELO does not use an iterative solution like ISR or Wilson. And unlike RPI, changing the ELO rating for a team does not affect the rating of any other team. So it is very simple and fast to calculate.

N(t) = the new rating for team t

R(t) = the old rating for team t

K = a maximum value for increase or decrease of rating

S(t) = the outcome of the game for team t

E(t) = the expected outcome of the game for team t

Since there are no draws in basketball (and since we're ignoring MOV), the outcome of a game for team t [S(t)] is 1 if team t won the game, and 0 otherwise. The heart of ELO system is determining the "expected outcome" for a game (the term E in the update equation above). This is a number between 0 and 1 indicating how likely team t was to win this game based upon the team's current ELO rating and the opponent's current ELO rating. ELO assumes that performance is a normally distributed random variable, and that each player has the same standard deviation. As a result, E(t) is defined as this:

E(t) = 1/[1+10^([R(o)-R(t)]/400)]where team o is the opponent in the current game. (The "400" in this equation is an historical artifact.)

The only variable in the ELO formula is "K," the maximum update to the rating allowed from one game. In chess this is typically 16 or 32 (depending upon the skill of the player). For our purposes, we can test a range of values and look for one that maximizes performance for college basketball.

Here are the results of testing ELO with our usual methodology:

Predictor | % Correct | MOV Error |
---|---|---|

Wilson | 77.7% | 10.33 |

ELO (K=16) | 71.6% | 11.77 |

ELO (K=32) | 71.6% | 11.67 |

ELO (K=64) | 71.8% | 11.59 |

ELO (K=100) | 71.4% | 11.60 |

ELO (K=200) | 70.7% | 11.76 |

Performance seems to peak around K=64, but even at its best is significantly short of the best performing rating so far. It is also significantly less accurate than the Trueskill rating (which is also based on Bayesian reasoning).

## No comments:

## Post a Comment