Monday, September 8, 2014

A Few More Papers

As usual, all these papers are available in the Papers archive.

[Trono 2007] Trono, John A., "An Effective Nonlinear Rewards-Based Ranking System," Journal of Quantitative Analysis in Sports, Volume 3, Issue 2, 2007.

Trono is very concerned about the NCAA football polls and with formulating a rating system that will closely match those polls.  I'm not exactly sure what utility that provides -- surely if I want to know what the polls say I can just look at them?  That issue aside, his description of his ranking system is vague and confusing -- I came away with no good understanding of how it worked or how to implement it. 

[Minton 1992] Minton, R. "A mathematical rating system." UMAP Journal 13.4 (1992): 313-334.
This is a teaching module for undergraduate mathematics that illustrates basic linear algebra through application to sports rating.  The ratings systems developed are simple systems of linear equations based upon wins, MOV, etc.  The systems are very simple, but this is a clear and detailed introduction to some basic concepts.

[Redmond 2003] Redmond, Charles. "A natural generalization of the win-loss rating system." Mathematics magazine (2003): 119-126.
Redmond presents a rating system based upon MOV that includes a first-generation strength of schedule factor. It isn't extremely sophisticated, but makes a nice follow-on to [Minton 1992].

[Gleich 2014] Gleich, David. "PageRank Beyond the Web," http://arxiv.org/abs/1407.5107.

This is a thorough and well-written survey of the use of the PageRank algorithm.  Gleich provides clear, non-formal descriptions of the subject but also delves into the mathematical details at a level that will require some knowledge to understand.  There is a section on PageRank applied to sports rankings, and Gleich also shows that the Colley rating is equivalent to a PageRank.  Required reading for anyone interested in applying PageRank-type algorithms.

[Massey 1997] Massey, Kenneth. "Statistical models applied to the rating of sports teams." Bluefield College (1997).
Kenneth Massey's undergraduate thesis is required reading for anyone interesting is sports rating systems.  He covers the least-squares and maximum-likelihood ratings that form the basis of the Massey rating system.

Thursday, September 4, 2014

Welcome Back & The Oracle Rating System

Welcome back!  I hope you had a great summer.  With Fall rapidly approaching my attention has returned (somewhat) back to NCAA basketball and sports prediction.  One trigger was happening across a paper from the June issue of JQAS:

[Balreira 2014] Eduardo Cabral Balreira, Brian K. Miceli and Thomas Tegtmeyer, "An Oracle method to predict NFL games", Journal of Quantitative Analysis in Sports. Volume 10, Issue 2, Pages 183–196, ISSN (Online) 1559-0410, ISSN (Print) 2194-6388,DOI: 10.1515/jqas-2013-0063, March 2014

The paper describes a variant of a random walker algorithm and uses it to predict NFL games. The work here was motivated by a quirky feature of random walkers.  Beating a very good team can raise a team's rating significantly, even if the rest of the team's performance is poor.  In some ways this makes sense, but it can lead to a situation where a mediocre team is ranked inordinately high based upon a lucky win over a very good team.  To address this, the Oracle algorithm introduces an artificial additional team (called the Oracle) and by varying how many times each real team has "won" or "lost" against this Oracle team, biases the resulting rankings.  The authors test the predictive performance of the Oracle rating on NFL games from 1966-2013, and out-perform rating systems like Massey and Colley, although only by small margins (1-2% in most cases).  The paper is well-written and comprehensive, with clear explanation of the approach, illustrative examples, and thorough testing.  

Since I have previously implemented various random walker algorithms, it wasn't difficult to implement this approach and test its performance on NCAA basketball games.  There were a couple of interesting results from this experiment.

First of all, I found the best performance was based upon the won-loss records of teams, and not margin of victory (MOV).  This is pretty unusual -- I don't think I've found any other rating system that performed better using won-loss than MOV.  The performance was also competitive with very good MOV-based rating systems.

Second, I found that for NCAA basketball games, the algorithm performed much better without a converting the results matrix to a column-stochastic form before creating the ratings.  A brief digression is in order to explain that remark.

Random walker algorithms model a system with a large number of random walkers:
Consider independent random walkers who each cast a single vote for the team they believe is the best. Each walker occasionally considers changing its vote by examining the outcome of a single game selected randomly from those played by their favorite team, recasting its vote for the winner of that game with probability p (and for the loser with probability 1-p).
If you let this process go long enough, it reaches a steady state, and the percentage of total walkers on each team becomes that team's rating.  That means that the sum of all the ratings is 1, and each rating represents the probability that a walker will end on that team.  When you formulate this as a matrix mathematics problem, you must normalize each column in the raw results matrix to sum to one (making the matrix "column stochastic") to ensure that the final ratings will represent the probabilities.

It isn't clear what the ratings "mean" if you don't convert to column stochastic form, but I found that the ratings had much better performance for NCAA basketball games without the conversion.  When I reported this result back to Eduardo Balreira, he tested it for his corpus of NFL games and found that it performed worse.  It's altogether a rather curious result and I'm not certain what to make of it.

In my experimentation so far, I haven't found any customization of the Oracle system that produces results better than my current best predictors.  However, it is close and has a few interesting properties that bear some more thought, so I may continue to play with it to see if I can discover a way to further improve its performance for NCAA basketball games.