Thursday, March 3, 2016

To Gamble or Not To Gamble, That is the Question

Or at least that's "a" question -- one that comes up yearly in the Kaggle competition.  Here's a version of it that popped up this year.

The Kaggle competition (for those who aren't aware) uses log-loss scoring.  Competitors predict which team will win as a confidence level (e.g., 95% certain of a win by Kentucky) and then are rewarded/punished accordingly.  And since the scoring is logarithmic, you are punished a lot if you make a very confident wrong decision.

The question that plagues competitors is whether forcing their predictions to be more conservative or less conservative will improve their chances of winning the contest.  (Or at least finishing in the top five and receiving a cash prize.)  Note that this is only concerned with winning the contest, not with improving the predictions.  Presumably your predictions are already as accurate as you can make them, and artificially changing them would make them worse -- in the long run.  But the Kaggle contest isn't concerned with the long run -- it's only concerned with how you perform during this particular March Madness.

As a thought experiment, let's assume that you could change your entry right before the final game.  You can see the current standings, but not any of the other entries.  Would you change your entry?  And if so, how?

Well, if you see that you're in first place with a big lead, you might not change it at all.  Or maybe you'd make your pick more conservative so that you could be sure you wouldn't lose much if your pick was wrong.  But if you didn't have a big lead (and in general the farther away from first place you were) you'd probably want to gamble on getting that last game correct.  At that point "average" performance cannot be expected to move you ahead of the team's ahead of you, and even "good" performance might be passed by someone behind you who was willing to gamble more than you.

Since it's much more likely that you will be losing the contest going into the final game than in first place with a big lead, I think this argues that (if your goal is to maximize your expected profit) you should "gamble" on at least the last game.  It's left to the reader to apply this reasoning recursively to games before the final game :-).

As a concrete example of this, last year Juho Kokkala submitted entries based upon "Steal This Entry" but with Kentucky's probabilities turned up to 1.0.   The non-gambling "Steal This Entry" finished in 42nd place, but if Kentucky had won out, Juho would have probably placed in the top two and collected some prize money. 


  1. This is a very interesting one to me. Leaving your best/true predictions is obviously best if the reward was proportional to your score, but since the reward is very skewed (though slightly less skewed this year), your true predictions are almost certainly NOT the best ones to submit for the contest (as far as maximizing your prize goes). I would love to try and find the optimal predictions to submit, given that my true predictions are truth and that I know everyone else's predictions. From there, it can be modified to guessing at what other predictions will be, as obviously you can't know ahead of time. Alas, there's almost no chance I have time to do this!

  2. I do some of this (ad hoc) for the Machine Madness Contest, e.g., last year I picked Arizona to beat Kentucky in the final game even though the true prediction favored Kentucky, because I figured if Arizona won I would win the contest, whereas if Kentucky won I might well still lose to someone who also had Kentucky but had done better in the earlier rounds. Much of the reasoning about the correct strategy depends upon your assumptions about how the other competitors are predicting, and it gets very complicated if you think they're doing the same sort of meta-reasoning about the contest!

  3. Yes, exactly! Agreed that changing your champion pick is 90% of what optimal would be in a traditional bracket contest, and that's usually what I do. But yeah, same principles apply, though would be much harder to both optimize your own picks and guess at others' picks in the Kaggle contest simply due to the sheer number of options.