Wednesday, January 28, 2015

Strength of Schedule & Adjusted Statistics (Part 2)

In a previous post, I talked about why Strength of Schedule (SoS) is important to interpreting a team's statistical performance, and I briefly described the standard SoS approach taken by Ken Pomeroy, SevenOvertimes, and others.

To briefly review, the standard approach for calculating SoS for a statistic like winning percentage (WP) for a team (T) is to average the winning percentage of all of a team's opponents:

SoS(T) = (1/n) sum_(i="opponents"(T))^n WP(i)

To use this SoS measure to interpret the original statistic, we could create an adjusted statistic:

WP_"adj"(T) = WP(T) * SoS(T)

This works fine for symmetric statistics like winning percentage, where a win for you means a loss for your opponent.  Unfortunately, winning percentage (and other won-loss stats) are about the only statistics with this property.  Most statistics are like three point percentage, where a team's performance is mostly unrelated to how well it's opponent does the same thing.  Instead, there's an offense-defense aspect to the statistic, and to interpret the statistic you want to know how well the opponent does at defending the statistic.  However, there's not usually a corresponding defense statistic (e.g., "3 PT defense"), so we have to derive the opponent's defensive strength by looking at how well the opponent has done in stopping other teams.  So in the case of three point percentage, we want to know how well a team's opponents have done at stopping the three pointer.

We calculate the Strength of Schedule by averaging the team's opponents' opponents performance:

SoS(T) = (1/(n*m)) sum_(i="opponents"(T))^n sum_(j="opponents"(i))^m 3PT%(j)

There's actually one more little wrinkle; we want to exclude the original team from the opponents' opponents.

SoS(T) = (1/(n*m)) sum_(i="opponents"(T))^n sum_(j="opponents"(i))^m (j ne team) 3PT%(j)

For example, suppose that Louisville is shooting 32% from the arc.  If the teams Louisville has played have held all their opponents to an average three point percentage of only 24%, then Louisville's 32% is more impressive.  Conversely, if Louisville's opponents have allowed the teams they played to average 48%, then Louisville's 32% looks less impressive.

Note that this SoS measure is backwards from the typical one used for symmetric statistics.  In this case, a smaller SoS indicates tougher competition.   (This all assumes that "bigger is better" for our statistic.  If we have a statistic where you want to have a low number, such as turnovers, everything flips.)

We can capture this analytically as an adjusted statistic (using S for a generic statistic, and assuming that for S bigger is better):

S_"adj"(T) = (S(T))/(SoS(T,S))

To return to the Louisville example, if the strength of schedule is 24%, then Louisville's adjusted 3PT% is 1.33.  But if the strength of schedule is 48%, then Louisville's adjusted 3PT% is only 0.66.

As should be obvious from that example, these adjusted statistics don't have any meaning.  They're just a number, where bigger is better.  But they can be used to compare teams, and may be more useful than the original statistic for prediction because they provide a common measure even when teams haven't faced the same opponents.

One problem that we haven't yet addressed is that this SoS measure only goes one level deep.  Maybe Louisville's opponents held teams to 24% three point shooting because they played a bunch of teams that were terrible three point shooters.  I'll address that in Part 3.