So we want to predict the outcome of a college basketball game. Presumably we are going to base this prediction on (primarily) the previous performance of the two teams. (Methods based on astrological signs and comparison of mascots have typically performed poorly.) So what information on past performance can we use to drive our prediction?
The simplest and most fundamental information is the team's won-loss record. This has been the basis of a number of different rating systems that purport to determine whether one team is better than another (and in some cases, how much better). Best known of these rating systems it the Ratings Percentage Index (RPI), which plays a key role in the selection of teams to the annual NCAA tournament.
At a slightly more complex level, we can look at margin of victory. Instead of using only the fact that Team A beat Team B, we can add in the magnitude of the victory. We presume that margin of victory is a reasonable proxy for the relative strengths of the two teams. If Team A beats Team B by 24 points, while Team C beats Team B by only 2 points, we believe that Team A is stronger than Team C.
Finally, we can delve even deeper into the statistics of past games, looking at things such as offensive efficiency, rebounding, steals, etc. We can look at the statistics for individual players. We can also look at situational factors -- where was the game played? How long has it been since each team's last game? And so on.
Of course, there are problems and challenges with this information. Looking only at won-loss obscures a host of obviously important factors (e.g., who was the home team). Much of the information available may have no predictive value (e.g., knowing which team is the better offensive rebounding team may not help us predict the outcome of the game). So part of the challenge of building an effective predictor will be winnowing through the available information and gleaning the valuable bits.
We'll get started on that in the next posting.