Wednesday, November 7, 2012

How well did Nate Silver do?

The news is saying that Nate Silver (who does election predictions at FiveThirtyEight) got fifty states out of fifty. It's being reported as a victory of math nerds over pundits.

In my humble opinion, getting 50 out of 50 is somewhat meaningless. A lot of those states weren't exactly swing states! And if he gets some of them wrong, that doesn't mean his probabilistic predictions were wrong. Likewise, if he gets them right, that doesn't mean he was right.

I thought it would be more informative to look at Nate Silver's state-by-state predictions of Obama's vote share. That way, Nate could be wrong even in states like California.  So here's what I did for each state: I took the difference between the final prediction of FiveThirtyEight, and the vote share reported by Google this morning.  Then I divided this difference by Nate's margin of error.  See the results in a histogram below.


What the figure shows is that Nate's predictions were more accurate than Nate himself claimed!

The mean of the actual distribution is -0.14, which means that Obama did slightly worse than Nate predicted, but by an amount that can be explained by random error.  The standard deviation of the distribution is 0.5, which means that Nate predicted an error that was twice the actual error.

Of course, Nate's reported error is likely due to expected systematic error.  For example, if all states were slightly more in favor of Obama, that would be a systematic error.  Assuming that Nate Silver predicted a spread of 0.5, he must have expected a systematic error of about 0.85 in one direction or the other.

2 comments:

drransom said...

What's the theoretical curve on your graph? It looks like a normal distribution with a standard deviation of 1 but I'm not sure. (And what's the theoretical basis for using that curve?)

miller said...

It is a normal distribution with standard deviation 1.

Take Colorado. FiveThirtyEight projected that Obama's vote share would be 50.8 plus or minus 3. So if there were many Colorados, then the distribution of vote shares would be a normal distribution with standard deviation of 3. But I divided the result by 3, so it should be a normal distribution with standard deviation 1.

There is only one Colorado, but I can produce the same distribution by aggregating all the states.

The problem is that the different states are correlated (presumably taken into account in Nate Silver's model). So in any given reality, you'd expect the spread of the distribution to be about 0.5, and the mean of the distribution to be plus or minus 0.85.