Wednesday, June 15, 2011

An Experiment with TrueSkill

At the moment I'm looking at the TrueSkill rating system and thinking about how to incorporate margin of victory (MOV).  (In particular, I'm watching (and thinking about) Tom Minka's lectures from the Machine Learning Summer School 2009.  These lectures are a great resource, by the way, and well worth browsing.)  As the best of the ratings that don't use MOV, it seems reasonable that TrueSkill might be even better if it incorporated MOV.    I haven't yet formulated a mathematically sound way to add MOV to TrueSkill (I'm open to pointers) but that of course has not kept me from experimenting.

Recall that in TrueSkill we update the ratings for the two teams involved in a game by comparing the strengths.  If you win a game over a strong opponent, than that's good evidence that your rating ought to rise and your opponent's fall.  And if you win a game over a weak opponent, than that's not good evidence to change the ratings (because you were expected to win).

So how should we interpret MOV?  One reasonable approach is to say that a win by a large MOV is better evidence that your rating should rise than a win by a small MOV.  (For the moment we ignore "Running Up the Score" and similar problems with MOV.)  Referring back to how TrueSkill works, winning by a large MOV is therefore similar to beating a stronger team.  So perhaps we can incorporate MOV into the TrueSkill algorithm by adjusting our opponent's rating up or down based upon MOV (creating an "effective" rating) and then updating our own rating accordingly.

That turns out to be pretty straightforward to add to the algorithm, and gives these results:

  Predictor    % Correct    MOV Error  
TrueSkill + iRPI72.9%11.01
Govan (best)73.5%10.80
TrueSkill (w/ MOV)73.3%10.91

This turns out to work surprisingly well for a completely arbitrary hack.  Some playing around with the bonusing function shows that performance is slightly improved by using MOV*2 as the bonus:

  Predictor    % Correct    MOV Error  
TrueSkill + iRPI72.9%11.01
Govan (best)73.5%10.80
TrueSkill (w/ MOV)73.3%10.91
TrueSkill (w/ MOV*2)73.3%10.88

Other bonusing variants and tweaks don't show any improvement.  This performance is not quite as good as the Govan rating, but certainly shows some promise.

No comments:

Post a Comment