Saturday, April 16, 2011

Averaging RPI

Previously, we looked at improving the the Ratings Percentage Index (RPI) by fixing its treatment of the Home Court Advantage (HCA).  We found that the best results were had by eliminating the HCA adjustments.  There are some other approaches we can explore to improve RPI's treatment of home court advantage, but we'll turn now to another area of possible improvement.

The RPI consists of the three terms: a team's winning percentage (WP), the winning percentage of the team's opponents (OWP), and the winning percentage of the team's opponents' opponents (OOWP).  These latter two terms are defined in a curious way.  They are not average values, but rather an "average of averages," e.g., OWP is computed by averaging the winning percentages of all the opponents.  Suppose, for example, that UCLA plays three opponents: USC (4-1), Arizona (6-0) and Oregon (0-1).  (USC and Arizona played in the preseason NIT.) OWP is calculated by averaging the WPs of these teams: (0.80+1.0+0)/3 = 0.60.  In contrast, the average OWP is (10-2) = 0.83.

About this, Paul Kislanko says:

This would be equivalent to defining a batting average in baseball by the average of the BA for each game played.  A 0 for 5 day followed by a 3 for 4 day would give (.000 + .750) = .375 instead of 3 for 9 = .333.  In basketball, a player in a 3-game tournament who hits 2 of 10 shots, then 3 of 6, then 4 of 10 would have a shooting percentage of (.200 + .500 + .400)/3 = .433, when in fact for the tournament she was 9 for 25 = .360.
There's no other formula in all of sports statistics that makes this mistake.

As far as I know, there's no reason that the NCAA chose to use an average of averages in calculating the RPI.  And I'm not aware of any reason why one method should be preferred over another, although Kislanko's argument is certainly compelling on its face.  It's certainly worth investigating which method provides the best predictions.

If we substitute averaging into our RPI algorithm, we get this performance:

Predictor    % Correct    MOV Error
1-Bit62.6%14.17
RPI (unweighted)74.6%11.53
RPI (unweighted+ave)74.2%11.63

This is worse performance, so the "average of averages" is the better choice (Kislanko's outrage over the poor math notwithstanding :-).

Another feature of how RPI calculates the OWP and OOWP is that if a team plays an opponent twice, that opponent's winning percentage is counted twice.  This makes some sense -- certainly if we played two different teams with identical WPs we'd want to count them both when figure the average strength of our opponents.  But perhaps it could be argued that playing a team a second (or third) time shouldn't affect the overall strength of your opponents.  Again, it is easier to test than to worry too much about a rationale.  If we eliminate duplicates opponents when calculating RPI (still using averages), we get this performance:

Predictor    % Correct    MOV Error
1-Bit62.6%14.17
RPI (unweighted)74.6%11.53
RPI (ave)74.2%11.63
RPI (nodupes)74.2%11.59
RPI (ave+nodupes)74.2%11.63

No improvement over straight unweighted RPI.

Before we leave this topic, let's perform another experiment.  Instead of averaging, we could try using the median value for the OWP and OOWP.  Imagine a team whose opponents have records of 3-0, 0-1 and 0-1.  The average of averages of these is 0.33; the average of these is 0.60; and the median of the averages is 0.00.  We could certainly construct a rationale for why using the median might be a good idea, but again there's really no a priori reason to prefer one over the other.  But it seems worth a quick experiment:

Predictor    % Correct    MOV Error
1-Bit62.6%14.17
RPI (unweighted)74.6%11.53
RPI (medians)72.8%12.14
RPI (medians+nodupes)71.4%12.57

Both medians and medians without duplicates per above underperform the unweighted RPI. So it appears that averages are better than medians, and averages of averages the best of all.

The literature of sports ranking systems is full of long pages of carefully derived formula ensuring the best and most accurate math.  But the most sophisticated math in the world is of little use if it does not contribute to improving performance.  Taking the "average of averages" might not make any mathematical sense, but since it performs better than the (mathematically) superior alternatives, we're happy to use it!