## Monday, January 13, 2014

Over on Stats Intelligence (a blog you should read if you don't already), there is a complaint today about the silliness of Monte Carlo simulations.  I have something to say about that, but before I do so, let me give a quick overview of Monte Carlo methods for those who might not be familiar with the approach.

The basic idea behind the Monte Carlo method is that you have a complex simulation of some process.  The outcome of the simulation depends upon some number of factors that you don't know with certainty, although you might be able to guess a likely range for them.  Also, these factors interact in complicated and non-obvious ways, so that a slight tweaking of a factor might lead to an entirely different outcome.  Using this sort of a simulation for prediction is pretty hopeless, because it's too sensitive to your guesses about these factors.  If you change your guess a little bit, the outcome changes.

For example, you might have a computer program that simulates an entire football game down-by-down.  Each down you select an offensive play, a defensive set, you choose which receiver to pass the ball to, etc.  You have random factors to cover things like corner backs falling down, etc.  This might make a fun game (think John Madden Football) but running Seattle vs. San Francisco one time and using that to predict next weekend's outcome is obviously foolish.

The idea with the Monte Carlo method is to "wash out" this sensitivity by running the simulation many, many times with a sampling of random values for the factors within their likely ranges.  Then you can sum up over the outcomes to get a percentage estimate for each outcome.  For example, you run John Madden Football ten thousand times and Seattle wins 64% of the time.

Over on Stats Intelligence, Jeff complains today about the arbitrary number of iterations people claim for their Monte Carlo simulations to give them a veneer of accuracy and in-depth analysis.  And I agree with him completely on this issue -- it's ridiculous to see "50,000" runs when you know that's 100x more than necessary.

But my complaint is different.

Over at the Harvard Sports Analysis blog (another blog you should read if you don't already), Julian Ryan has a posting which uses a Monte Carlo approach to estimate Harvard's chances of winning the Ivy League championship in basketball.  (By the way, he did 50,000 simulations :-)  In his simulations, he estimated Harvard's chance to win each game based upon Ken Pomeroy's ratings.

This sounds like a sophisticated approach that will give new insights into the Ivy League competition.

But here's the thing.  The outcome of each simulated game is based upon exactly one factor -- a percentage derived from the ratings of the two teams.  So when Harvard plays Yale, you plug the ratings into a formula and out comes the likelihood of a Harvard victory -- 74%, say.  Now let's imagine we "simulate" that game 50,000 times by rolling a 100-sided die and giving Harvard a victory if the number is 74 or below.  At the end of this excruciating exercise, guess what percentage of the simulated games Harvard has won?

Yes, 74%.

IMPORTANT

The Monte Carlo method doesn't provide any value if your simulation is based upon a few fixed, known factors.

If you just have a few fixed factors, you can calculate the likelihood of an outcome directly.  You don't need to use a Monte Carlo approach.  If you look at Julian Ryan's results, you'll see that Harvard is the most likely winner of the Ivy League, followed by Princeton and then Columbia.  Is that a big insight from the Monte Carlo approach?  No.  That's simply the order of the teams in the Pomeroy ratings.  If you look at the expected number of wins for Harvard, you'll see it looks like a normal distribution.  Well, it should, because that's what you get when you take the mean of a bunch of random outcomes.

Now to be fair to Mr. Ryan, he's not the only one to do this sort of thing.  In fact, Ken Pomeroy uses Monte Carlo simulations to predict conference results.  (With, not surprisingly, the same results for the Ivy League.)  The only defense I can offer is that a Monte Carlo simulation is a fairly straightforward way to estimate a number that can be hard to calculate.  If I tell you that Harvard has a 94% chance of winning when it hosts Yale, and a 74% chance when playing at Yale, then the chance that Harvard goes 2-0 is .94x.74, the chance that they go 1-1 is .94*.36 + .06*.74 and the chance that they go 0-2 is .06*.36.  It gets increasingly hard to calculate the likelihoods as the number of teams and games goes up.

(Although in a league that plays home-and-home between all teams, this is all unnecessary.  The chances of winning the league are directly proportional to the strength ratings!)

In summary:  For a simple simulation based upon a few fixed factors, the Monte Carlo method may be useful for estimating hard-to-calculate numbers but doesn't offer any additional insight beyond the known factors.