## Friday, September 16, 2011

### New Papers

(All of the following papers have been added to the papers archive.)

[Gill 2008] "Assessing Methods for College Football Rankings," JQAS 2008

Summary: This paper purports to "...consider several mathematical methods for ranking college football teams based on point differential... [and] assess the predictive performance of these models using leave-one-out cross validation."  The models considered are variants of least-squares fitting of rating values to point differential.  Variants include different fitting methods (e.g., weighted least squares) and methods for limiting the impact of blowouts (e.g., cutting off the point differential at 14 or 28 points).  Predictive performance is used to assess cutoff values for blowouts.

Comment:  A disappointing paper for me; from the title and abstract I had hoped that this paper would analyze some set of football ranking approaches for their predictive value.  Instead, the main conclusion of the paper seems to be that one can construct a rating system that emphasizes nearly any aspect of competition by selecting the right approach and tuning constants.

[Wigness 2010]  "A New Iterative Method for Ranking College Football Teams," JQAS 2010
And see: WWR Rankings

Summary: This paper describes a method for ranking college football teams.  The method uses (potentially) score, location (home or away) and time of season to create an initial value for each game, and then iteratively re-rates games until equilibrium is achieved.  The method has a number of parameter/options, and the paper evaluates the performance of several combinations.  Performance is measured by the % of correct predictions for bowl games.   Over 9 seasons, the best combination predicts about 59% of the total bowl games correctly, and about 63% of the total BCS bowl games.  In contrast, over the same span the BCS computer rankings have predicted about 57% of the BCS games correctly.

Comment:  A fairly interesting paper, and apparently the work mostly done by an undergraduate.  The approach is at least somewhat novel -- it involves creating a graph where the nodes are teams and the links are games between teams, and then summing all the simple paths originating from a team and going out "K" links.  (Where K is a parameter, but K=4 was the best performing.)  There's no intuitive (to me at least) meaning for doing this, but to some extent it captures the strength of opposition, the same way RPI uses OWP, OOWP, etc.  I'd like to implement and test this system, but the naive implementation for calculating all the simple paths is likely going to be very slow, and if there's a clever matrix implementation it doesn't occur to me.  I've put a question in to the authors asking about their implementation.

[Loeffelholz 2009] "Predicting NBA Games Using Neural Networks," JQAS

Summary:  An ensemble of several different neural networks fed with team statistics was used to predict NBA games.  Performance was assessed using "% Correct" and compared to consensus picks from five experts published in USA Today.  The ensemble methods did not improve upon the best included baseline predictor.  The best predictor (feed-forward NN) predicted 74% of the test games correctly (compared to 69% for the human experts).

Comment: There are number of interesting results in this paper.  First, the authors looked at both (1) splitting team statistics based on home/away, and (2) using only the most recent 5 games, and in both cases found no value.  This agrees with my own experiments with similar approaches.  Second, the authors experimented with various combinations of statistics and had the best performance using only FG% and FT% for each team.

[Beckler 2009] "NBA Oracle," CMU Classwork
And see:

Summary: This paper describes an effort to use various machine learning techniques to predict NBA game outcomes (as well as some related tasks).  Inputs to the learning process were 62 features for each game -- most features were averages for the current season and the previous season of team statistics such as rebounding, shooting percentage, etc.  The most effective technique was linear regression, which predicted about 70% of games correctly -- comparable to human experts.  The most important statistics were team winning percentage in the previous season, and (in decreasing importance) defensive rebounds, points made by opposing team, number of blocks and assists made by opposing team.

Comment:  A fairly straightforward attempt to predict NBA games based upon team statistics.  Prediction accuracy is in line with similar work (although below Loeffelholz) -- around 70% seems to be fairly easy to achieve for NBA games.  There's no attempt to predict MOV.

[Orendorff 2007] "First-Order Probabilistic Models for Predicting the Winners of Professional Basketball Games," JQAS 2007

Summary: This paper describes an effort to apply Bayesian Logic (BLOG) and Markov Logic Networks (MLN) to predicting NBA games.  Inputs to the models are won-loss records.  The MLN model performs best, predicting 76% of games correctly.

Comment: The methodology here is similar to my methodology -- the research uses a x-validation on the entire NBA season.  However, there is one very important distinction.  This research uses the entire season's data to predict the held-out games -- not just the season up to the time of the predicted game.  This makes a huge difference in prediction performance, so take the authors' result of 76% accuracy with a grain of salt.  It's likely that the accuracy using season-to-date data would be 15-20% lower.

[Trono 2010] "Rating/Rankings Systems, Post-Season Bowl Games, and 'The Spread'", JQAS 2010

Summary: This paper compares a number of simple systems for predicting college football bowl games.

Comment: This is a difficult paper to analyze.  It is written in a very colloquial, unorganized manner and lacks a clear purpose.  The systems analyzed are described in vague terms that make it difficult to understand the computational implementation, or even to attribute authorship of the systems.  All that said, at least one system described has out-performed the Las Vegas line (by 1 game) over a 7 year period.

[West 2008] "A New Application of Linear Modeling in the Prediction of College Football Bowl Outcomes and the Development of Team Ratings," JQAS 2008

Summary: This paper uses linear regression to build a predictive model for college football bowl games.  The inputs to the model are average statistical measures (e.g., "Offensive yardage accumulated per game").  The model predicted 19 of 32 bowl games correctly (59.4%).

Comment: This paper is of particular interest to me at the moment because I've also turned to looking at prediction using statistical team measures.  This work seems to agree with my result that only a few measures (mostly related to scoring) have significance in the final model.  Also of interest here is that West pre-conditions his statistical measures by expressing all of them in units of "standard deviations from the mean."