Friday, February 19, 2016

More Kaggle News, ESPN Irritates Me

As a follow-up to this previous post, the Kaggle competition is officially back.  A good deal of data is available, and the forums have been moderately active.  The new Kaggle Notebooks feature is getting some exercise, too:  there are 116 scripts for this competition at the moment, although I'm unclear on what they all are.  There are at least a couple of scripts to calculate ELO ratings and similar things.  Might be worth a look if you're just getting started in this area.

Prizes this year are considerable -- $20K split 10/6/4/3/2.  I suggested awarding prizes for the best performance on each round of the Tournament, but that might have been too hard to implement quickly.  At any rate, spreading the prizes down to 5th place is a good improvement.  The contest is basically random amongst about the top 100 or so contestants, so weighting all the money at the top makes it even more of "random number lottery."

On a completely unrelated note, the NetProphet predictor broke on me last night.  It turned out that ESPN has changed the format of its box scores.  You can see the new format here.  The change seems to have also broken all the past seasons.  If you go to (say) November 2014 the scoreboard and schedule pages will claim that no games were played.

ESPN has been modifying their page formats for a while now, and I was expecting a change at some point.  The scoreboard page had earlier been modified to run from JSON data embedded in the page, and I was expecting to see something similar happen with the box scores and other game pages.  But interestingly enough, although the page formats have changed, they haven't gone to using embedded JSON data on these pages.  That's too bad, because pulling the JSON data out of the page, parsing it and then using it is more straightforward -- and probably a lot more robust -- than pulling data out of the HTML.

Saturday, February 6, 2016

Kaggle Competition is Back for 2016

I've been remiss about posting to the blog, but I thought I'd share that a little birdie hinted to me that the Kaggle Competition will be back again this year, with perhaps some new twists.  So keep your predictors warmed up.

I'm undecided whether I'm going to provide "Steal My Entry" again this year, but I might be interested in a private collaborative effort. In particular my thought is to merge an entry from my predictor -- which mostly focuses on regular-season games -- with a predictor that has specifically been trained on tournament games.  I'll provide my model's game predictions for all the tournament games back to 2009, and then you train a tournament-specific model using my predictions along with any other information you think is valuable (e.g., team seedings, locations, etc.).  Contact me if that sounds interesting -- and this isn't an exclusive offer, I'm happy to collaborate with multiple folks either individually or as part of a larger group.