Polls and Bad Sausage: Another Look at the Business of Polling
I have always found polls fascinating. The idea of taking a carefully measured sample of opinion to project the outcome of elections, especially at the national scale, has always impressed me. But as years passed and I learned that polls were sometimes wrong, that impression lost some of its shine, especially as it became obvious that polls seldom address problems in their procedures, especially when speaking to the public. That is especially obvious when we look at the 2016 Presidential election. Simply put, the polls performed poorly, they all know this, yet after a couple weeks of admission that they needed to look closely at how they did polling, the industry closed ranks and simply denied anything was wrong.
That’s not to say there were not valuable insights in polls, or that the polls failed in every metric. But the blank statement that “they got it right” is so incorrect that the only valid way to describe the behavior is that the polls are lying to save their butts. This article addresses the situation from 2016, the mistakes made by polls, why it’s getting worse and what can be done to save the industry from their own bad actors.
So to start, why do I say that the polls were wrong in 2016? The pollsters, basically all of them, point to the spread in polls compared to the actual spread in the aggregate national popular vote between Hillary Clinton and Donald Trump, throw in their published margins of error and say ‘voila, we were right’.
The problems start with the fact that the spread between candidates is the result of other calculations, and how you get there is as important as the result itself. I struggled with Calculus because getting the right answer was not enough – you had to show a valid process for how you go there. Same thing applies to polls.
To see what I mean, consider the 2016 Real Clear Politics final polls for the 2016 Presidential Election. The national polls show ten polls which RCP included in their aggregate, although twenty-two different polls were listed during the campaign on their aggregation. But those ten were blended by RCP to give a 46.8% to 43.6% Clinton to Trump prediction, and a 3.2 point spread. Since Clinton ended up with 48.2% actual support and Trump ended up with 46.1% actual support according to RCP (the actual numbers round Trump up to 46.2%, but whatever), and a spread of 2.1%, the RCP crows that the polls did nicely.
But in actual fact, the ten polls published support for Clinton ranging from 43% to 50%, and 3 of those did not get Clinton’s support within their published MoE. Those ten polls also published support for Trump ranging from 39% to 47%, and 5 of them – fully half – failed to get Trump’s support within their published MoE. So even though 5 of RCP’s selected polls to aggregate got the spread within their Margin of Error, only 4 out of those 10 got the support for Clinton, the support for Trump, and the Spread within their MoE. 40% gets you an F in most schools.
That does not touch the 12 polls listed during the campaign by RCP which did not publish a poll during the final week, and it really does not touch the state polls, which by any sane standard reeked of failure. Fully 26 states had a final average projection which was outside the MoE compared to the spread. It’s kind of funny how pollsters just ignore the state polls, since the 2016 election was a strong reminder that the Electoral College decides the election outcome, not the aggregate popular vote. The pollsters just hope no one pays much attention to the smoking wreckage of the state polls. I will pay more attention to the state polls in a later article, since they are so important, but for here I move on to the autopsy of the 2016 election polling, starting with a well-done if absurdly biased article from Pew Research by Courtney Kennedy.
Ms. Kennedy starts off with a topic statement worthy of a high school essay: “A robust public polling industry is a marker of a free society.”
Sorry Ms. Kennedy, but that is not so. Getting a lot of polls published in the media is just noise, unless they are done properly. A poll is not valid just because it shows up, and even the biggest, most financially-prominent polls (looking at you, YouGov) can be garbage if the people running the polls manipulate the data to tell a false story, which does happen. The integrity of the polling process is where the free society is heard, or not. When polls are in conflict with actual outcomes, the possibility of error must be addressed, and pollsters have a serious duty to guard against manipulation. Interestingly, Ms. Kennedy admits that very point, even as she claims – wrongly – that the 2016 polls were correct in the same article.
The Pew Research article, titled “Key things to know about election polling in the United States”, is impressive in length – I copied and printed out the text, linked articles and notes, and it comes to two hundred twenty pages and 69,400 words. There is a lot of information there, some of it quite cogent and salient, but sadly some of it is self-serving and misleading.
Before I go on, I want to make clear that I do not mean to attack Ms. Kennedy. She worked hard to explain how polling works, as she sees it. She presented sources to support her arguments, and she has organized argument along the lines her employer would be glad to see.
She just happens to have allowed her bias to color her thinking, and makes some assumptions that reflect an industry-wide problem of hubris, in particular that protecting market share is more important than getting the call right.
My article here is to review Ms. Kennedy’s article from a different perspective, and to note the effect of the problems she notes on the 2020 polls. Before I begin, I start with a summary statement from Ms. Kennedy with which I completely agree: “Errors in 2016 laid bare some real limitations of polling, even as clear-eyed reviews of national polls in both 2016 and 2018 found that polls still perform well when done carefully.”
I just wish the major polls cared enough to do their jobs “carefully”, relative to what they are doing now. With that said, here are Ms. Kennedy’s major points from her article:
1. Different polling organizations conduct their surveys in quite different ways.
Well, I certainly agree with that statement, although I disagree with Ms. Kennedy’s opinion that “survey methodology is undergoing a period of creative ferment”. What’s happening is that polling is becoming a lot more difficult and expensive, especially by the traditional random telephone methods. So a lot of polling groups are using other methods, like online polling, opt-in panels (bye bye random participation) and recruited respondents. Some even mix methods. Not so long ago, such methods were rejected by polling professionals because they destroyed the random character of polling, but to save money most polling groups have diluted their standards in order to get the product out on time.
2. The barriers to entry in the polling field have disappeared.
Agreed somewhat, but only somewhat. Ms. Kennedy correctly observes that “technology has disrupted polling in ways similar to its impact on journalism: by making it possible for anyone with a few thousand dollars to enter the field and conduct a national poll.” The funny thing is that there is a class system to polls. We all know polls like Gallup or Pew, and of course every media outlet has polling groups they use and trust to put their name on as a brand. The new guys can get into the game, but unless they go the way of private internal polling, they have to fight big brands like YouGov or Rasmussen to get any attention. Consider the Democracy Institute polls that have Trump leading by 3 points. I have some issues with the thinktank poll, but it’s interesting to see how RCP and FiveThirtyEight just ignore them on no better basis than they don’t want to count them in their pool of polls to aggregate. Ms. Kennedy observes that “there has also been a proliferation of polls from firms with little to no survey credentials or track record.” My complaint is that the blunders I saw in 2016 came from the major polls just as often as the new guys. It’s an undeserved cheap shot to blame the poor performance in 2016 and possibly this year on the new guys.
3. A poll may label itself “nationally representative,” but that’s not a guarantee that its methodology is solid.
This is a good point, and one I wish Ms. Kennedy had pursued to greater depth. While Ms. Kennedy dwells on Pew’s standards and various academic groups, as I just said in the last point the big guys laid some stinkers that need to be addressed. Essentially, the integrity of a poll is something which should be inspected and tested by the polling group every single week, to make sure standards are maintained and any outside observer can see how the results are produced. Instead, what I see is more and more polls hiding internal data from review, and most polls refuse to even be upfront about their demographic weighting. When you won’t show your work, you should not be surprised if and when someone doubts your honesty.
4. The real margin of error is often about double the one reported.
This is something I have said for a long time. As Ms. Kennedy notes, “the margin of error addresses only one source of potential error: the fact that random samples are likely to differ a little from the population just by chance. But there are three other, equally important sources of error in polling: nonresponse, coverage error (where not all the target population has a chance of being sampled) and mismeasurement. Not only does the margin of error fail to account for those other sources of potential error, it implies to the public that they do not exist, which is not true.”.
Where Ms. Kennedy and I disagree, is the point that polls want readers to believe in their accuracy. It’s the whole reason sponsors pay for polls – they want to be able to say who is going to win, and by how much.
The plain fact is that even the best polls provide a snapshot look at an election, and should be considered over a period of time, and with attention to demographics which drive support for the candidates. This is not Magic 8-ball stuff.
5. Huge sample sizes sound impressive, but sometimes they don’t mean much.
Cue the old story about George Gallup making his bones in 1936, by calling the election with a sample size literally a thousand times smaller than the largest poll at the time. Ms. Kennedy remarks that “the reality of modern polling is different. As Nate Cohn of The New York Times has explained, “Often, the polls with huge samples are actually just using cheap and problematic sampling methods.”
Actually, it’s that polls track human behavior, not empirical lab results. If you know what to watch, you can get the sense early on, and if you miss the relevant keys, the size of the respondent pool won’t make your results better.
6. There is evidence that when the public is told that a candidate is extremely likely to win, some people may be less likely to vote.
Yes and no. Certainly when the media tells the nation that the race is over, there is a real risk of overconfidence and voters who have only mild interest in a candidate see no reason to go out and vote. But it’s important to understand the phenomenon of voter expectations. That is, over history it turns out that who voters expect to win an election is more accurate than who voters say they will support. That means that some people choose to vote because they believe their candidate is going to win, contradicting Ms. Kennedy’s claim.
7. Estimates of the public’s views of candidates and major policies are generally trustworthy, but estimates of who will win the “horse race” are less so.
To coin Mr. Trump’s phrase from the 2016 debates, “wrong”.
Ms. Kennedy bases her claim on the poor favorability ratings for Trump and Clinton in 2016. Ms. Kennedy claims that was “a signal that many Americans were struggling to decide whom to support and whether to vote at all”. That assumption does not support claiming that voter expectations are false. The historical record is clear and Ms. Kennedy is clearly wrong.
Ms. Kennedy compounds the error by falling back on the narrative that “there will be added uncertainty in horse race estimates stemming from possible pandemic-related barriers to voting. Far more people will vote by mail – or try to do so – than in the past, and if fewer polling places than usual are available, lines may be very long.” Again, these are emotional assumptions, not valid observations based on evidence.
8. All good polling relies on statistical adjustment called “weighting” to make sure that samples align with the broader population on key characteristics.
The problem here is that I see a growing number of polls manipulating weighting in unreasonable ways, especially political party affiliation. Where Ms. Kennedy loftily assures us that polling groups use weighting to “correct imbalances between the survey sample and the population”, a close look at polls reveals that sometimes that is just an excuse to fiddle with internal numbers, and leads to a non-representative result. In extreme cases, the weighting is used to artificially induce a false result.
9. Failing to adjust for survey respondents’ education level is a disqualifying shortfall in present-day battleground and national polls.
Yes and no. Education does reflect different demographic groups, but polls consistently fail to track all relevant types of education. Polling groups fail, for example, to consider vocational training or training which leads to industry certification. These kinds of education are actually more relevant to the demographic character of a person than assuming a Bachelor of Science in Engineering is the same as a Bachelor of Arts in Gender Studies. Ms. Kennedy also makes the blunder of saying that over-participation in polls by certain demographics can “easily be corrected through adjustment, or weighting, so the sample matches the population.” That’s like saying you can fix putting in too much salt by adding more flour or sugar – simply adding more of something else does not necessarily fix the problem.
11. Transparency in how a poll was conducted is associated with better accuracy.
Again, yes and no. Transparency is indeed a healthy trait in an industry. And some polls do a good job of covering the basics in that regard. But most polls refuse to disclose their demographics, relying on a vague statement that they use weightings consistent with the general public. Many refuse to reveal their weighting in detail at all, and that is just plain unethical. As for revealing the poll’s sponsor, many do report the name of the sponsor, but nothing about the sponsor’s political leanings or – in a growing number of cases – relationships with Democrats and even specific candidates. The polls simply do not police themselves, yet they want to pretend they do.
12. The problems with state polls in 2016 do not mean that polling overall is broken.
Oh yes they do. Ms. Kennedy is just plain wrong here. All polls should have a basic integrity to them, in that national numbers must be driven by state-level results. Ms. Kennedy falls back on the lie I disproved earlier in this article when she reports “national polls in 2016 were quite accurate by historical standards.”
Cherry-picking one metric – which many polls failed to meet, anyway – while ignoring everything which does not support your claim is the excuse of a failing student. It’s not what a professional firm should embrace as their statement. As long as polling firms refuse to address flaws in their procedures and standards, they invite future failure.
13. Evidence for “shy Trump” voters who don’t tell pollsters their true intentions is much thinner than some people think.
Again, this is a clearly false statement. Exit polls from both CBS and the New York Times in 2016 state that fully four voters in ten made their decision after Labor Day. The problem for the polls is that polls taken in September and October 2016 regularly showed around 7 or 8 percent undecided. That means that of the voters who had not yet decided whether to vote for Trump or Clinton, more than three-quarters either did not talk to pollsters or lied about their vote. And since Trump won the majority of those late deciders, by definition a large portion of 2016 voters were “Shy Trump” voters. Pretending otherwise is just silly. Ms. Kennedy relies on a “committee of polling experts” for her claim, but not only fails to consider the direct evidence of exit polls versus pre-election polls, but also fails to consider that polling “experts” would have a bias strongly in favor of finding no such voters exist. Again, this is a blunder that is likely to comeback and bite them again.
14. A systematic miss in election polls is more likely than people think.
This point is interesting to me, because just a little while after wrongly saying the state poll problems do not mean problems for polling overall, Ms. Kennedy says that “state-level outcomes are highly correlated with one another, so polling errors in one state are likely to repeat in other, similar states.”
Again, this is an area where Ms. Kennedy – and others – should carry the idea to its logical conclusion. State polling errors carry over to other states, and so they necessarily also carry into national numbers, which are built off state tallies.
15. National polls are better at giving Americans equal voice than predicting the Electoral College.
There is no proper way to be polite here – this statement is a crock of Biden. “Equal voice” simply means reporting the public support in context and with the best accuracy possible. Ms. Kennedy doubles down with this false claim: “2000 and 2016 presidential elections demonstrated a difficult truth: National polls can be accurate in identifying Americans’ preferred candidate and yet fail to identify the winner.”
We are not, and have never been, a nation where the biggest mob wins. The United States is a Republic where every state matters in the decision of our federal leaders. In 2000 and 2016, the winner of the election won thirty states and a clear majority of electoral votes. That’s what matters, not what big-population states want. It’s frankly concerning that pollsters still fail to grasp how Presidential elections work.
My summary is that Ms. Kennedy tries hard to look at the way polling works, but she fails to honestly criticize the bias and slanted history of her industry. If this is what happens when someone is willing to discuss the problems with 2016, imagine the bias and hubris of polling firms which still pretend they got it right, and what that means for this year. No one likes to see how sausage is made, but we need to know why some polls are worse than bad sausage.
Published at Sun, 20 Sep 2020 03:10:17 +0000