This model accounts for plenty of factors, & will change accordingly throughout the cycle based on changes in polling, the economy , & Trump’s Approval Rating. This model is probabilistic, so when a state is 51% Trump, it is not calling the race for him, but saying he is barely favored, and the race is going to be tight.
Collecting the Data
My polling numbers are from fivethirtyeight, and so are the pollster ratings. My economic data is collected from The St. Louis Federal Reserve website (FRED). My Economic Index, has data that is inputted into the model.
Calculating the Projected Vote
I use four fundamentals for my model. Partisan lean, Trump approval, economic index, projected popular vote.
The partisan lean is calculated from previous elections. It is reverted based on the national average, to see how much one state leans to one party compared to the nation as a whole
The partisan lean and Trump approval are combined together to create one index. I got my approval by state from Morning Consult For example right now Trump has a -4% net approval in Virginia, and a partisan lean of 3% dem. So the partisan lean index for Virginia is 3.5% for the democrats.
I created an economic index model to compare the economy towards previous economies in the US. It standardizes each index to make them comparable. It then combines them together to create the full economic index. It then regresses it to give an advantage/disadvantage for the incumbent, which is Trump.
Very simple, projects the popular vote. With no candidate from the Democrats, I use a mixture of data. I use general ballot average which is weighted around 20%. The other 80% is calculated by the Democratic candidates v. Trump. Each candidate is weighted by the chance of winning the nomination. This is calculated from the Democratic Primary Forecast. Once the candidate is nominated, it will be based solely on the projected margin in the election.
Projecting States Votes
I get my polls from fivethirtyeight like I stated earlier. I favor polls with a 3rd party included. If not included I adjust the poll to include 3rd party by using the 3rd party average for that state, if non for that state, use the national average. Polls with higher grades, and sample sizes get more weight. However, I used a diminishing returns formula to make effective sample size.
And when a pollster posts a new poll for that state, the older pols become lighter in weight. The weight of the poll is the effective sample size. The other adjustments to the polls are the time away from the election, sample type, and undecided voters. Likely voters > registered voters > adults. The less undecided poll respondents, the higher the weight.
Combining the polls
To combine polls, I use a method I created by myself. It takes the effective sample size and multiplies it by the adjusted poll numbers for each candidate. For example, a poll let’s say has an effective sample size of 500, and the adjusted polling numbers are D-45% R-50% 3rd-5%. The Democrat would be given 225 points, Trump 250 points, and 3rd 25 points. This is done or every poll, then added up for each candidate. Those totals are divided by the total number points for that state to give each candidate their polling vote share.
Combining the data
The weight of each category changes throughout the election cycle. The first data point is fundamental. It counts about the same as a poll with a effective sample size of 500. The second data point is the polling average. The more and better the polls for that state, the higher the polls weigh. The final is similar state index. It takes the polling average of states that are similar to that state in demographics, location, and partisanship. The polling average is compared to the partisanship of each similar state, and then regressed to the state’s partisan lean to come up with the state similarity vote share. This weight is dependent on the how similar states are to the state in question. These are weight combined to give you the projected vote share for each candidate. The more polls, the less the variance for the state. This is why you see some safe D states in likely D, because the variance is very high as the model is uncertain at the time.
Simulating the election
The election is simulated 10,000 times to satisfy the law of large numbers and the central limit theorem. Each simulation comes up with random numbers. The first one is the national mood, which applies to every state. The next one is regional, which applies to only the states in each region (South, Midwest, Northeast, and West.) The final number is for the states. Each state is a different election, so each state gets their own random number. The combined number becomes the p-value, in which each simulation is based off of normal distribution. They are combined based on the states elasticity and variance of that state. This is done for all candidates. They are then sum adjusted to equal 100% as they could have been higher or lower depending on the random number. The national and regional numbers are in the terms of Republicans as they are the incumbent, so to put them in democrats formula the numbers are just subtracted from 1. This is done for every state and then repeated for each simulation. Then the winner of each state is awarded their respective electoral votes, and tallied up to get the winner of that simulation. This is repeated for all 10,000. That is the simulation and how it works.
If you have any questions or notice any bugs in the model, please reach out to me, or if you just want to talk politics, and stats!
Created by Jack H. Kersting