Aggregating the Aggregates

Thanks to compiled by Kevin Collins at Princeton, we can examine the accuracy of some of the state-level forecasting models. Nate Silver’s 538 model performs marginally better than the pack. But the best predictor: an average of the forecasts.

To assess accuracy, we calculate the Root Mean Squared Error. To do so, we take the actual result in state i, $R_{i}$ , and a forecaster’s prediction in state i, $P_{i}$ , and calculate:

$\sqrt{ \frac{1}{n} \sum_1^n (R_i - P_i)^2 }$

As you can see, higher values indicate that the forecaster made bigger errors. Put another way, the number shows us how badly each forecaster missed on average.

Alex Jakulin, a statistician at Columbia, helpfully pointed out that a more useful metric may be the RMSE weighted by the importance of each state. We would expect misses to be larger in small states and should correct for that. Accordingly, we present the RMSE for each forecaster, and the RMSE weighted by the proportion of electoral votes controlled by each state.

Forecast	RMSE	Wtd RMSE
Silver 538	1.93	1.60
Linzer Votamatic	2.21	1.63
Jackman Pollster	2.15	1.71
DeSart / Holbrook	2.40	1.79
Margin of Σrror	2.15	1.98
Average Forecast	1.67	1.37

All told, the forecasts did quite well. But look at what worked better: averaging over the forecasts. This makes good statistical sense: as Alex points out with a fun Netflix example, it makes more sense to keep as much information as possible. In a Bayesian framework, why pick just one “most probable” parameter estimate, instead of averaging over all possible parameter settings, with each weighted by its predictive ability?

During the Republican primaries, Harry Enten published a series of stories on this blog doing precisely that; and then, as now, the “aggregate of the aggregates” performed better than any individual prediction on its own.

On the whole, all the forecasting models did quite well. As the figure above shows, critics of these election forecasts ended up looking pretty foolish.

I only now wonder if the 2016 will see a profusion of aggregate aggregators; and if so, how much grief Jennifer Rubin will give them.

Category: Election 2012

Tom says:

Surely cell K43 on that spreadsheet must be wrong.

November 8, 2012 at 5:21 pm

Reply

Brice D. L. Acree says:

Tom,

Yes, some of the data showed Romney’s vote instead of Obama’s. I went through and fixed those prior to computing the RMSE. My earlier numbers had included some probs with data, as well as some old data. I’ll post the data I used.

November 8, 2012 at 6:55 pm

Reply

Chris says:

Any reason that your RMSE analysis did not include Princeton’s Sam Wang? Could you add that to the chart?

November 8, 2012 at 6:50 pm

Reply

Brice D. L. Acree says:

Chris,

I checked Prof. Wang’s forecasts, but couldn’t find the state-by-state predictions. I think Kevin Collins got his directly. I can e-mail Prof. Wang to get his numbers.

November 8, 2012 at 6:56 pm

Reply

gwern says:

I actually pinged Wang on Twitter last night as part of collecting a larger dataset of predictions, but he hasn’t gotten back to me AFAIK, so don’t necessarily expect a reply soon.

(I was collecting all the overall presidential stats, per-state margins, per-state victory %, Senate margins, and Senate victory %, for Silver, Linzer, DeSart, Intrade, Margin of Error, Wang, Putnam, Jackman, and Unskewed Polls, not that I got all of them but the Brier scores and RMSE are still interesting to look at.)

November 8, 2012 at 8:53 pm

Reply

gwern says:

My results so far if anyone is interesting in those Brier/RMSEs:

(It was for this blog post: http://appliedrationality.org/2012/11/09/was-nate-silver-the-most-accurate-2012-election-pundit/ )

November 9, 2012 at 12:41 am

Reply

Brian says:

Wang’s state by state forecasts are found in the right-hand column of his site under “The Power of Your Vote”. He doesn’t list every state but he does show every swing state.

November 8, 2012 at 10:34 pm

Reply
Luke Muehlhauser says:

Additional Brier score and RMSE score comparisons here:
http://appliedrationality.org/2012/11/09/was-nate-silver-the-most-accurate-2012-election-pundit/

November 9, 2012 at 12:50 am

Reply
Some Body says:

Isn’t it a bit too early, though? We don’t yet have the final results, and won’t have for a couple of weeks (Ohio counting provisional ballots on the 19th and all).

November 9, 2012 at 7:12 am

Reply

Brice D. L. Acree says:

Brody,

It’s never too early! But more to your point… yeah, we will update with final results. My über-sophisticated forecast: these RMSEs won’t change much.

November 9, 2012 at 10:13 am

Reply

Brash Equilibrium says:

I describe some possible methods for calculating model averaging weights from Brier scores and RMSEs at my blog. Check it out. Maybe Margin of Error will see if a weight average model does -even better- than a simple average?

http://www.malarkometer.org/1/post/2012/11/some-modest-proposals-for-weighting-election-prediction-models-when-model-averaging.html#.UJ2kBeQ0WSo

November 9, 2012 at 7:47 pm

Reply

Paul C says:

Yes its too early. http://www.huffingtonpost.com/2012/11/08/best-pollster-2012_n_2095211.html?utm_hp_ref=@pollster

November 10, 2012 at 11:32 am

Reply

Pingback/Trackback

Final Result: Obama 332, Romney 206 | VOTAMATIC

Paul C says:

Sam Wang computes his own Brier score here with others: http://election.princeton.edu/2012/11/09/karl-rove-eats-a-bug/#more-8830

November 10, 2012 at 11:21 am

Reply
fLoreign says:

Yeah dude, do make aggregators of poll aggregators four years from now. I’ve seen the collapse of mortgage derivatives; will the election forecast business be the next bubble? Like, fifty statisticians for each survey interview operator, so that the world of forecasts could create its own noise?
I’m betting on popcorn.

November 11, 2012 at 4:43 am

Reply

Brash Equilibrium says:

Here’s a funny short story I just wrote about an election prediction bubble. Enjoy.

“Aggregatio ad absurdum”

November 12, 2012 at 11:10 am

Reply

Arnold Evans says:

What happens if you calculate Brier scores and RMSE’s only of swing states and close (somehow defined) races?

In some sense, being right about VA, FL and OH should matter more than CA, NY and TX. I don’t care who’s the most accurate in a lot of races so I would like to know whose scores are best in the races I and most people actually cared about.

November 14, 2012 at 2:25 am

Reply

Pingback/Trackback

More accurate than Nate Silver or Markos—and simple, too | The Penn Ave Post

Pingback/Trackback

More accurate than Nate Silver or Markos—and simple, too | FavStocks

Pingback/Trackback

Evaluating Election Forecasts | Robust Analysis

Pingback/Trackback

Forecasting Round-Up No. 2 « Dart-Throwing Chimp

Pingback/Trackback

Homepage

Margin of Σrror

Aggregating the Aggregates

22 Responses

Tom says:

Brice D. L. Acree says:

Chris says:

Brice D. L. Acree says:

gwern says:

gwern says:

Brian says:

Luke Muehlhauser says:

Some Body says:

Brice D. L. Acree says:

Brash Equilibrium says:

Paul C says:

Pingback/Trackback

Paul C says:

fLoreign says:

Brash Equilibrium says:

Arnold Evans says:

Pingback/Trackback

Pingback/Trackback

Pingback/Trackback

Pingback/Trackback

Pingback/Trackback

Leave a Comment or Cancel reply