Some Philosophy on Politics With a Statistical Analysis of Polls

Listen, if you don't want to read about my political musings or the assumptions I put into this analysis, you can skip to the first graph I made and the results section.

The truth of what Polls actually mean

I think a good example of a biased political narrative, regardless of political slant, is attempting to use polls to paint a political narrative.

One thing I hear often in some form or another is,

"The polls show that Joe Biden is ahead and that means he will win!"

and the response to this is often

"Well actually, Hillary Clinton was ahead in the polls and lost and that means the polls are inaccurate and meaningless."

Both of these statements contain truth. Joe Biden is currently ahead in the polls. This does not mean that the polls are saying he will absolutely win.  It's also true that Hillary Clinton did indeed lose four years ago even though she was ahead in the polls. That doesn't mean that polls are meaningless.

If conducted scientifically well, it is true that polls have inaccuracy, but that is also true with any Gaussian distributed scientific measurement.

In fact, if the polls perfectly predicted the election, I would say that would be incredibly worrying as it likely means someone is cheating the election. Polls can never predict precisely what will happen during an election, the only thing they can actually do is predict the odds of one candidate winning the election over the other.

Since I actually have a lot of statistical experience due to my training in physics and astronomy, and since I have trust issues when it comes to the media, I actually decided to calculate those odds myself.

First some assumptions on Polls up to July 3, 2020

I am going to present my findings as transparently as I possibly can. Data is never perfect and because of this, in order to actually make any reasonable calculation, some ground assumptions must be made. 
  • I am assuming that the polling data from accurately reflects what polls are reporting
  • I am assuming that these polls include all polls being taken, or that they are not excluding certain polls for no good reason
  • I am throwing out any poll that has a poll grade below a "C" as determined by fivethirtyeight. I am also assuming that this grading system is scientifically consistent from poll to poll.
  • I am not directly correcting for poll bias and I am not weighting the polls in any manner.
  • I am assuming past election results can be used in how I calculate who will currently win
  • I am assuming the election is Gaussian
  • I am assuming my Python code is correctly implemented

So, without getting into too much technical detail I first must calculate the error of the polls. I could just calculate a standard deviation of the poll, but I would argue that would be inaccurate as a standard deviation assumes that the mean value is not changing overtime. However, polling averages absolutely change overtime and so a standard deviation cannot be used.

However, something called an RMS, or a root-mean squared error, is probably more appropriate. This type of error calculation assumes that any measurement follows the current mean, meaning (excuse the pun) that as long as I model how the polls change overtime, I can calculate the overall error for a poll.

The way I decided to create model polling data is by doing a polynomial regression as a function of time of the polls. I did this because a polynomial can approximate any function if used correctly. I ended up using a polynomial of order 4, if you really wanted to know. I found it didn't make too much of a difference how many orders I used after a certain point, so I have stuck with 4.

If all this is doesn't make any sense because you don't work in statistics, I made a graph

The blue "X's" are the raw polling numbers for reported for Joe Biden. The red "+'s" are the raw polling numbers for Donald Trump. The smooth blue line is the best model I came up with for Joe Biden. The smooth red line is the best model I came up with for Donald Trump. To get the RMS error I added in quadrature the polling error for Donald Trump and Joe Biden.

The Results Section and What This all Means

So all of what I have done so far, is simply create a smooth line which approximates the mean poll numbers for Joe Biden and Donald Trump over a full year.

This is pretty neat. However, what is even better however, is that I can now accurately calculate a RMS error for both candidates.

Once I have the RMS error, I can run simulated elections. This is exactly what I did.

Here are my assumptions for who wins with my simulated elections
  • In 2016, Hillary Clinton was polled to beat Donald Trump by 3.2% and she ended up actually beating Donald Trump by 2.1%, according to RealClearPolitics.
  • Because of this, and this is probably the most precarious assumption, I am assuming that in order for Joe Biden to win, in any simulated election, he has to win by 3.5% when using poll data to simulate said election.
  • I assume that Joe Biden has to win by 3.5% as that is better then what Hillary was predicted to win by in 2016. This also assumes that the 2016 election was extremely close (which it was).
  • I then run 100 million simulated elections. I stopped at 100 million because my laptop begins to metaphorically vomit at the computational load such high numbers require.
So what is the result of all these simulated elections? What does any of this mean?

Here is a Normal distribution of the election results. The area shaded  is analogous to the percentage either  candidate wins. It shows Joe Biden wins most of the time.

This is a Normal Gaussian distribution which represents the simulated elections. The X-axis represents the percent of people who voted for Joe Biden over Donald Trump. A negative percent means more people voted for Donald Trump over Joe Biden. The red portion represents represents the percentage of elections Donald Trump wins. A purple election represents the percentage of elections that could truly go either way. The blue portion represents the percentage of elections Joe Biden is nearly guaranteed to win.

The odds Joe Biden wins the popular vote by 3.5% if the election were held on July 3, 2020:

4:1 in favor of Joe Biden

The odds Joe Biden wins the popular vote by 4% if the election were held on July 3, 2020:

3:1 in favor of Joe Biden

The odds Joe Biden wins the popular vote by 6% if the election were held on July 3, 2020:

2:1 in favor of Joe Biden

This means that Joe Biden has around a 66% to 80% chance of winning. Donald Trump has a 20% to 33% chance of winning If the election was held on July 3, 2020.

This is of course assuming that Joe Biden wins if he is predicted to beat Donald Trump by 3.5 to 6 percent in the general election popular vote polls, which is better than what Hillary Clinton was predicted to do.

These odds can obviously change as they are apt to do by election day

You Must be Biased!!!

Listen, keep your pants on. This is just what my code spat out with the numbers I fed into it. I personally interpret these results as:
  • Joe Biden is likely to win assuming polling remains the same
  • Donald Trump is less likely to win, but can still win assuming polling remains the same
Which is a nebulous result at best.

These results are based upon the data that I got from Could FiveThirtyEight be rigging the data? I don't see why that would benefit them in any way. Besides, it would be pretty easy to tell if the poll numbers were different from what actual pollsters are reporting.

Did I rig the analysis? Honestly, not on purpose. I've made an effort to be fair and objective in said analysis. If you want to believe I'm lying anyway, I guess I can't stop you. There is also the possibility that I made a mistake, I'll be sure to correct it if I find one. There are also some issues with my assumptions, mainly due to how simple my simulated elections are.

But lets be reasonable! I'm not getting paid to run this analysis. Elections in the USA are far more complicated then just a Gaussian popular vote.

Doing a full, in-depth analysis using the electoral college, state polling, and voter by voter error is something that I'm not getting paid to do. It would also take me much longer to do that analysis. Therefore, I don't want to do it!

However, I do think that what I have done isn't entirely unfair. It's like assuming air-friction is negligible in a physics problem to make things more simple to calculate. It can get you close to the true answer, but not necessarily the exact answer. It is why I put a purple section and a range of percentages for my answer.

You also have to realize that Joe Biden's advantage has only come recently. 3 months ago, it's possible this analysis would have given Donald Trump the advantage.

So Now What?

Honestly, the biggest unknown is how the future polls will turn out.

If I were to guess, one of the bigger things that will change between now and November, which could have a dramatic effect on polling, is who Joe Biden chooses for his Vice President.

Conservatives generally have had a difficult time disliking Joe Biden, akin to how they disliked Hillary Clinton. A Vice President could change that for better or for worse. To be honest though, I have none of the data to say who would be best to pick.

I don't believe people will see Donald Trump much differently from now to November. People seemed to have pretty much made up their mind at this point.

Because things will change, I plan on doing some updates in the future. If you are interested, be sure to keep posted.

Some Philosophy and How I Think
So I am an astrophysicist and will be studying at Cornell University for a PhD. Generally, I try to keep politics out of any work I do as politics is almost always a controversial topic. In fact, it is so controversial, that it is constantly on the news.

However, I actually started my college career studying political science before I switched to Physics and Astronomy. I wanted to be a lawyer as I enjoy debate way more then any normal person should (I will stay up until 3am debating some random economic theory with you!).

Of course, my tendency and willingness to debate nearly any topic, no matter how controversial it may be, can be misunderstood sometimes. An example of this is how I have both been called a communist and a laissez faire capitalist, which is honestly kind of funny to me being that the two ideas are so diametrically opposed to one another. 

I think the biggest misunderstanding usually comes from what others perceive I am trying to accomplish through my debate, which is seen sometimes as me just trying to "prove them wrong". This can especially happen if I raise points that may be hard to counter.

That's not really what I'm trying to do at all though. I believe the strongest and truest ideas are ones that can be defended. For me personally, if you are able to defend your beliefs in a logical and consistent fashion, you are the most likely type of individual to convince me that you may be correct (of course this may not always be true). What I generally try to do is to find what is true. Using what I find to be true, I then try to shape my beliefs in a fashion which, if put into practice in any given society, would strive to create the best/happiest society for everyone involved.

This often means my views are flexible, how I respond to certain situations is flexible, and how I view the world is one that contains much nuance. This is mainly because through getting to know people, people who are all over the place in terms of belief, I've learned that most people are not fundamentally bad people, they just have different backgrounds and different ways of thinking. I might even say those on the extreme ends of the political spectrum, especially when it comes to how they can sometimes interpret dissenting viewpoints, are more similar to each other than they might ever want to believe.

This of course means that it may be hard if not impossible to truly label me as with any specific political ideology, since the views I hold are not necessarily tied to any particular belief system or political party.

It's also partly why I decided to pursue physics and astronomy instead of political science. Most of what politics is, is not so much trying to find the truth, but trying to convince others that you hold the truth, regardless of how true your opinions may or may not be.

Anyway, what I'm trying to say is that this analysis is as unbiased as I can possibly make it. I am seeking the truth, not trying to paint a political narrative.


  1. Great post! Very insightful :) I would love to read something where you talk more explicitly about your own political views. Can you write you something like that? I am interested to here about your controversial opinions!

    1. thanks! I never did post an update to this but I guess the results of 2020 did kind of match up with my predictions though. Anything in particular?


Post a Comment