Thanks to data compiled by Kevin Collins at Princeton, we can examine the accuracy of some of the state-level forecasting models. Nate Silver’s 538 model performs marginally better than the pack. But the best predictor: an average of the forecasts.
To assess accuracy, we calculate the Root Mean Squared Error. To do so, we take the actual result in state i, , and a forecaster’s prediction in state i, , and calculate:
As you can see, higher values indicate that the forecaster made bigger errors. Put another way, the number shows us how badly each forecaster missed on average.
Alex Jakulin, a statistician at Columbia, helpfully pointed out that a more useful metric may be the RMSE weighted by the importance of each state. We would expect misses to be larger in small states and should correct for that. Accordingly, we present the RMSE for each forecaster, and the RMSE weighted by the proportion of electoral votes controlled by each state.
|DeSart / Holbrook||2.40||1.79|
|Margin of Σrror||2.15||1.98|
All told, the forecasts did quite well. But look at what worked better: averaging over the forecasts. This makes good statistical sense: as Alex points out with a fun Netflix example, it makes more sense to keep as much information as possible. In a Bayesian framework, why pick just one “most probable” parameter estimate, instead of averaging over all possible parameter settings, with each weighted by its predictive ability?
During the Republican primaries, Harry Enten published a series of stories on this blog doing precisely that; and then, as now, the “aggregate of the aggregates” performed better than any individual prediction on its own.
On the whole, all the forecasting models did quite well. As the figure above shows, critics of these election forecasts ended up looking pretty foolish.
I only now wonder if the 2016 will see a profusion of aggregate aggregators; and if so, how much grief Jennifer Rubin will give them.