Wednesday, November 21, 2012

Another Approach to Early Season Performance

Continuing on with my efforts to better model early season performance, it occurred to me that it might be good to model a team as the average of several previous years teams.  So we'd predict that Duke 2012-2013 would perform like an average of the 2009-2010, 2010-2011, and 2011-2012 teams.

This is a fairly straightforward experiment in my setup -- I just read in all three previous seasons as if they were one long preseason, and then predict the early season games.  Of course, with a twelve thousand game "preseason" this takes a while -- particularly when you keep making mistakes at the end of the processing chain and have to start over again :-).

At any rate, the conclusion is that this approach doesn't work very well.  The MOV error over the first thousand games was 12.60 -- worse than just priming with the previous seasons data.

Tuesday, November 13, 2012

More on Early Season Performance

Prior to my recent detour, I was looking at predicting early season performance.  To recap, experiments showed that predicting early season games using the previous season's data works fairly well for the first 800 or so games of the season.  However, "fairly well" in this case means an MOV error of around 12, which is better than predicting with no data, but not close  to the error of around 11 we get with our best model for the rest of the season.  The issue I want to look at now is whether we can improve that performance.

A reasonable hypothesis is that teams might "regress to the mean" from season to season.  That is, the good teams probably won't be as good the next season, and the bad teams probably won't be as bad.  This will be wrong for some teams -- there will be above-average teams that get even better, and below-average teams that get even worse -- but overall it might be a reasonable approach.

It isn't immediately clear, though, how to regress the prediction data for teams back to the mean.  For something like the RPI, we could calculate the average RPI for the previous season and push team RPIs back towards that number.  But for more complicated measures that may not be easy.  And even for the RPI, it isn't clear that this simplistic approach would be correct.  Because RPI depends upon the strength of your opponents, it might be that a team with an above-average RPI who played a lot of below-average RPI teams would actually increase its RPI because we would be pushing the RPIs of the its opponents up towards the mean.

A more promising (perhaps) approach is to regress the underlying game data rather than trying to regress the derived values like RPI.  So we can use the previous season's data, but in each game we'll first reduce the score of the winning team and raise the score of the losing team.  This will reduce the value of wins and the reduce the cost of losses, which should have the effect of pulling all teams back to the mean.

The table below shows the performance when scores were modified by 1%:

  Predictor    % Correct    MOV Error  
Early Season w/ History75.5%12.18
Early Season w/ Modified History 71.7%13.49

Clearly not an improvement, and also a much bigger effect than I had expected.  After all, 1% changes most scores by less than 1 point.  (Yes, my predictor is perfectly happy with an 81.7 to 42.3 game score :-)  So why does the predicted score change by enough to add 1+ points of error?

Looking at the model produced by the linear regression, this out-sized response seems to be caused by a few inputs with large coefficients.  For example, the home team's average MOV has a coefficient of about 3000 in the model.  So changes like this scoring tweak that affect MOV can have an outsized impact on the model's outputs.

With that understood, we can try dialing the tweak back by an order of magnitude and modify scores by 0.1%:

  Predictor    % Correct    MOV Error  
Early Season w/ History75.5%12.18
Early Season w/ Modified History (0.1%)  74.8%12.15

This does slightly improve our MOV error.  Some experimenting suggests that the 0.1% is about the best we can do with this approach.  The gains over just using the straight previous season history are minimal.

Some other possibilities suggest themselves, and I intend to look at them as time permits.




Thursday, November 8, 2012

How to Pick a Tournament Bracket, Part 2

In the previous post, I looked at a strategy for picking a Tournament bracket.  The basic idea is that to win a sizable Tournament challenge, you can't just pick the most likely outcome of each game.  You're going to have to pick at least some of the inevitable upsets correctly.  A reasonable way to do that is to decide how many points from upsets you think you'll need, and then pick some combination of upsets to reach that number.  It turns out the best way to do that is to pick late-round upsets between closely-matched teams.

However, there are some concerns with that approach.  One is that if you pick "likely" upsets (such as a #2 over a #1), it's reasonable to assume that many of your competitors might pick the same upset.  So although the upset might be both likely and high-scoring, it might not do much to separate you from your competitors.  That's an interesting problem, but one we'll leave for another day.

Another concern is that the strategy is "all or nothing."  We are assuming that we'll need (say) G = 16 points to win the Tournament challenge and make picks accordingly.  But in truth our chance of winning the Tournament challenge is more of an S-curve:

We have some guess at G that will give us a reasonable chance to win the challenge, but we might end up needing more or we might be able to win with less.  With G = 16 the strategy I've outlined so far leads us to pick a single 16 point upset in a semi-final game.  This is fine if the upset occurs.  But if it doesn't, losing 16 points moves us a sizable distance to the left on the S-curve and greatly reduces our chances of winning.  Something of this sort happened to my entry (the Pain Machine) in the last Machine Madness contest -- the PM predicted a Kansas-Kentucky upset that would have left it at G = 34.  But since Kentucky won, the PM ended up at G = 2 and lost the contest to a predictor at G = 7.  We'd really like a strategy that optimizes our chance to win under all possible scenarios.

Mathematically, this is the sum over all possible outcomes of the likelihood of the outcome times the likelihood of winning under that outcome.  If we pick a single 16 point upset, then there are two possible outcomes: the upset happens or it doesn't.  If L(n) is the likelihood of winning the challenge with n points, then expected value of that strategy is:
EV = L(0) * (1 - p(u,v)) + L(16) * p(u,v)


But if instead we picked two 8 point upsets, then there are four possible outcomes: neither of the upsets occur, the first upset occurs but the second doesn't, the first doesn't but the second does, or they both occur.  The expected value of this strategy is more complicated: 
EV = L(0) * (1 - p(u1,v1)) * (1 - p(u2,v2))
         L(8) * p(u1,v1) *  (1 - p(u2,v2))
         L(8) * (1 - p(u1,v1)) * p(u2,v2) + 
         L(16) * p(u1,v1) * p(u2,v2)
Depending upon the probability of the various outcomes and the likelihood of winning, the expected value of this strategy might be higher than picking a single 16 point upset, even though the chances of scoring the full 16 points are reduced.

Up until now, I've been implicitly assuming that the possible outcomes of an upset pick are either zero or n points.  But that's not really true.  The cost of an incorrect pick can be greater than just losing the points for that game. 

For example, last year the Pain Machine correctly predicted that #15 Lehigh would beat #2 Duke.  Given the rarity of 15-2 upsets, that was an amazing prediction.  But even if it was a very likely upset, it would have been a bad pick, because there was a potentially high cost if the upset didn't happen.  To see why, here is the bracket:

If the prediction is correct, the Pain Machine picks up 1 point for the correct first round prediction.  But if the prediction is incorrect, the Pain Machine is very likely to lose 2 points when Duke wins in round, 4 more points when they win in the third round, and so on. (Since as a #2 seed, we expect them to win until the round of eight.)

We can generalize this idea as a value formula for a win by U over V:

        V(u,v) = (p(u,v) * roundi ) - (p(v,i+1) * roundi+1 ) - (p(v,i+1)*p(v,i+2)* roundi+2 ) ...

Here, p(u,v) represents the probability that U defeats V, roundi represents the scoring value for the Ith round of the tournament, and p(v,i+1) represents the probability of V defeating their likely opponent in  round i+1 if they had not been upset by U.  To return to the Lehigh-Duke example, the value is the probability of Lehigh beating Duke times 1 (the value of that round) minus the probability of Duke beating Notre Dame (their expected opponent) times 2 (the value of that round), and so on.

To maximize V(u,v) we must maximize p(u,v) and minimize p(v,i+1).  And since roundi+1 = 2*roundi, it is twice as important to minimize p(v,i+1).  To translate this into plain English, we want to pick upsets where the team being upset has very little chance to win its next round game.  That's why the Lehigh upset was a poor pick -- because as a #2 seed Duke had a very good chance to win its second round game.

Instead, this formula will value upset picks like #10 Xavier over #7 Notre Dame.  (Which also happened!) To see why, look again at the bracket:
Whichever team wins the first round game -- Notre Dame or Xavier -- they are likely to lose the second round game to the stronger Duke team.  Thus the downside of the upset pick is minimized -- if Notre Dame wins and then loses to Duke as expected, you'll only have lost one point for the incorrect upset pick.

This insight is nothing new.  Canny pickers already look for upsets that are "firewalled" off in the next round by a strong opponent.  However, the value formula above gives us an objective measure for comparing between possible upset picks.  I suspect that most people incorrectly assess the p(u,v)  vs. p(v,?) tradeoff.  Because scoring doubles each round, it's much more important to consider the cutoff in the next round than the upset chance -- which most people probably find counter-intuitive.

Unfortunately, the strategy of "firewalling" upset picks runs counter to the strategy of picking high-scoring late round upsets, because (assuming most games are not upsets) the mismatches which make good firewalls primarily occur in the early rounds.  If the tournament runs mostly true to the seedings, the late round games are usually between closely-matched teams and do not make good firewalls.  An interesting exception is the Championship game itself.  If you pick an upset in the Championship game incorrectly, you're guaranteed not to lose any additional points. 

To summarize these thoughts about picking a tournament bracket:
  1. A bracket consisting of chalk picks and true mis-seedings is not likely to win a sizable Tournament challenge.
  2. Picking late-round upsets between highly-seeded teams has the advantages of (1) scoring a lot of points, and (2) being relatively likely to occur.
  3. To maximize the overall chance of winning the challenge, it may be better to spread your upset picks rather than bet "all or nothing."
  4. Upset picks which are firewalled in the next round reduce the downside risk of an incorrect pick.

Wednesday, November 7, 2012

How To Pick A Tournament Bracket, Part 1

Pre-season is probably not the best time to be pondering the Tournament, but I've been recently thinking a bit more about the challenge of predicting the Tournament with the goal of winning something like the ESPN Tournament Challenge or the Machine Madness contest.  These sorts of contests are a dilemma to a machine predictor, because most predictors try to determine who is most likely to win a particular matchup.  But of course, that's exactly how the Tournament is seeded.  So the machine predictors end up predicting almost entirely "chalk" outcomes.

The only time the machines don't predict a win for the higher seed is when they believe the teams have been mis-seeded -- that is, when the Committee has made a mistake in their assessment of the relative strengths of the teams.  In last year's Machine Madness contest, Texas over Cincy and Purdue over St. Mary's were consensus upset picks by five of the six predictors -- strong evidence (to my mind, anyway) that those teams were mis-seeded.  But, for all the grumbling by fans, the Committee does a pretty good job at seeding the Tournament, and you can't expect to find many true mis-seedings. 

Neither chalk picks or mis-seedings are likely to win a Tournament challenge against a sizable field.  That's because (1) a lot of your competitors will have made the same picks, (2) there will be a significant number of true upsets where a weaker team beats a stronger team (historically, 22% in the first round, and 15% for the Tournament overall), and (3) someone out there will have picked those upsets.  So to win a Tournament challenge, the machine is going to have to pick some actual upsets -- and then hope that it gets lucky and those upsets are the ones that happen.

Knowing the historical frequency of upsets, my strategy last year was to force my predictor to pick 6 upsets in the first round and 5 more in the rest of the tournament.  But is that the right way to pick upsets?  How can we pick upsets to maximize (in some sense) our chance to win the Tournament challenge?

The first problem in answering this question is knowing how many points will be sufficient to win the challenge, because that will drive the selection of upsets.  Obviously, it's impossible to know this number a priori.  However, we could look at previous Tournament challenges and see how many points the competitors in the top (say) 1% had scored off correctly predicting upsets.  That would provide a reasonable goal G for our upset calculations.

Sadly, ESPN, Yahoo, etc., seem to remove the Tournament challenge information from the Internets fairly quickly, so I can't actually research this.  (If someone has some info on this, please let me know!)  However, we do have the results of the last two Machine Madness contests.  Last year, the winning entry scored 127 points and the "chalk" (baseline) entry scored 120 points, for G = 7.  The year before, the winning entry scored 69 points and the chalk entry scored 57 points, for G = 12.  (There's undoubtedly a correlation between the size of the field and G.   G = 12 might be sufficient most years to win the Machine Madness contest, but probably wouldn't be enough to win the ESPN Tournament Challenge.)

If we adopt the notation that V(u,v) is the value of a victory of Team U over Team V, then we will want to pick upsets such that:
G < V(u1,v1) + V(u2,v2) + V(u3,v3) ...
Because of the way the tournament is structured, the value of V(u,v) is determined by the seeding of the two teams.  The following table has seedings down both axises and shows how many points an upset is worth:



For example, a #8 seed beating a #1 seed is worth 2 points, because that matchup will necessarily occur in the second round.

If we adopt the notation that p(u1,v1) is the probability of u1 defeating v1, then the probability of G is:
p(u1,v1) * p(u2,v2) * p(u3,v3)  ...
(because we must get all of our upset picks correct to score G points).

Now imagine that we are predicting the tournament and we know that most games have a 0% chance of an upset.  However, four of the third round games are very likely upsets -- 49%.  And one of the semi-final games has a slight chance of an upset -- about 6%.  If G = 16, which upsets should we pick?

The (possibly surprising) answer is that we should pick the very unlikely semi-final upset!  To get 16 points we have to pick either the semi-final game, or all four of the third round games, and:

       .06 > .49*.49*.49*.49

The joint probability equation combined with the typical Tournament challenge scoring means that it will almost always be better to pick unlikely late round upsets that score highly than multiple likely early round upsets that score poorly.  Knowing this, it's easy to see that my strategy in previous years to force a certain number of upsets into my bracket was very non-optimal.

So, given a goal G and upset probabilities p(u,v) we have an approach for selecting upsets from our bracket.  We've seen how to calculate V(u,v) and how to estimate GHow can we estimate the upset probabilities?

Many predictors will produce something that can be used to estimate upset probabilities.  For example, in past years my predictor has used the predicted MOV to estimate upset probabilities -- the slimmer the predicted margin of victory, the more likely an upset.  But lacking any information of that sort, we could estimate the upset probabilities based upon historical performance of seeds within the tournament:


This table shows the upset percentage for each seeding matchup for the last ten years.  (I have left out matchups that have occurred 4 times or less.)  Each upset percentage is shaded to indicate the value of the matchup in typical Tournament challenge scoring.  For example, matchups between #1 seeds and #2 seeds have been won by the #2 seeds 52% of the time, and they are worth 8 points. With a few oddball exceptions (such as the 2-10 matchups), this table shows is that you should prefer to pick upsets of the #1 or #2 seeds by #2 or #3 seeds.  These matchups are worth the most points and -- because the teams are closely seeded -- are nearly tossups.

So if G = 16, filling out your bracket with all chalk picks and two #2 over #1 upsets would give you the best chance to win the Tournament challenge.

More thoughts to follow in Part 2 at some later date.


Friday, November 2, 2012

Papers Archive Updated

The archive of academic papers on rating sports teams or predicting game outcomes has been updated to include the papers reviewed here as well as about a half-dozen other new papers.  A listing of the papers and a link to the archive can be found on the Papers link on the right side of this website.

I'm always interested in the latest research in this area, so please let me know if you're publishing a relevant paper or if you have a pointer to a paper I've missed.  Thanks!