As a lifelong college basketball fan and a student of Artificial Intelligence (AI), I was intrigued in 2010 when I saw Danny Tarlow's call for participants in a tournament prediction contest. I put together a program and managed to tie Danny in the Sweet Sixteen bracket.
The program I wrote in 2010 used a genetic algorithm to evolve a scoring equation based upon features such RPI, strength of schedule, wins and losses, etc., and selected the equation that did the best job of predicting the same outcome as the games in the training set. I felt the key in winning a tournament picking contest was in guessing the upsets, so I added some features to the prediction model intended to identify and pick likely upsets.
After the tournament ended, I remained intrigued by the problem of predicting basketball games, and continued to work on my program. In 2011 I did a little better, winning both brackets in Danny's contest. During the end of the college basketball regular season, I also tested my program against the Bodog lines, and showed a significant profit over about six weeks of betting.
In the final days before the tournament this year, I discovered some glaring problems in my picking program (and my testing methodology). I wasn't able to address those problems before the deadline for Danny's contest this year, so I am in the process of re-writing my picking program and examining the various issues and possible approaches.
Shortly we'll begin an in-depth look at the "Ratings Percentage Index" aka RPI but first it will be helpful to discuss a few broader issues, such as the limits on prediction.