Friday, October 10, 2014

Day of the Week Effect && Polynominal Variables in Linear Regression

Motivated partly by recent discussion of Thursday Night Football, I began to wonder if the day of the week has any impact upon college basketball games.  This is a little bit of a tricky topic, because conferences play games on different nights (e.g., the Ivy League plays on Friday nights) so there's some conference bias mixed into any discussion of the impact of day of the week.  But I decided to ignore that for the moment and just look at the straightforward question.

This is a little trickier than you might expect, because my prediction model uses linear regression.  Linear regression works fine when we're looking for the relationship between two numerical variables (e.g., how does rebounds/game affect score) but it doesn't work so well with polynominal (not polynomial!) variables.  A polynominal variable is one that takes on a number of discrete, non-numeric values.  In this case, day of the week can be Monday, Tuesday, Wednesday and so on.

To use a polynominal variable in linear regression, we turn it into a number of binominal variables.  In this case, we create a new variable called "DOW = Monday" and give it a 1 or 0 value depending upon whether or not the day of the game is Monday.  We do this for each possible value of the polynominal variable, so in this case we end up with seven new variables.  We can then use these as input to our linear regression.

When I do so, I find that only one of the new variables has any importance in the regression:

      0.6636 * DOW = 4=false

Translating, this says the home team is at a small disadvantage in Friday games.  I leave it up to the reader to explain why that might be true.  (Ivy League effect?)


We can also look at whether predictions are more or less accurate on some days.  When I do that for my model, I find that the predictions are most accurate for Saturday games, and the least accurate for Sunday games.  The difference in RMSE is about 6/10 of a point, so it's not an entirely trivial difference.  In fact, Saturday games are more accurate than any other day of the week.