Back in January, I wrote an article on the elections in the US. In this blog post, I analyzed the reasons why ex-president Trump had lost the electoral battle of Pennsylvania (which eventually cost him his presidency). So the question needs to be asked: why did I decide to make such a research in the first place? Well, there were two fundamental reasons behind this decision.
First, the last year was full of events that had been influencing the mood of American citizens throughout the year. Surely, those events affected their choice in voting too. It was especially interesting to observe the results in the Swing States, where Joe Biden had managed to win. Second, I wanted to find hidden patterns and discover why the eventual outcomes in many states have changed compared to 2016. Are those changes representing a short-term trend? Or are they all related to the massive turmoil in American society?
And finally: why Pennsylvania, though? Well, it’s one of the key states in the USA. In addition, the Secretary of State made publicly available intermediate results and totals, and it turned out that Pennsylvania represents a sort of an “average” state in terms of income and educational level. Therefore, we assumed that the revealed patterns in this state could be applied to the whole country.
Here’s what I and our Akveo team of analysts did. We analyzed the poll results in Pennsylvania as a whole and each county, respectively. Then we studied the difference from 2016. In our research, we considered the following indicators:
1. The total turnout by the state and for each county (the ratio of voters to the total number of eligible citizens, and the ratio of voters to the number of registered voters).
2. The total number of votes and the percentage of margins for each county's winning candidate compared with 2016.
3. The absolute value of the margin between candidates for each county.
Data Analysis and Power BI reports
Let’s see in detail the data sources that we used and reviewed (only the major of them). Here's Katsiaryna Tamashevich, Business Intelligence Developer at Akveo:
1. projects.fivethirtyeight.com was the primary source of inspiration. The site contains various analyses on different topics. Moreover, it provides source data and recommendations on calculations. We took the rating data from here.
2. data.pa.gov. This governmental site provides various insights on the state of Pennsylvania. We had high expectations of finding something useful here (like demographic and political data) but with little success.
3. Kaggle is an excellent source of open datasets. The greatest thing about it is that every dataset has a link to the original source, so it's possible to obtain even more data if needed.
4. GitHub is useful for storing code but also data in some cases.
5. Reuters.com. A trust-worthy source of data, this website includes excellent analyses. However, it took a while to extract the raw data.
6. census.gov. Many other sources refer to this website. The problem is that in our region, census.gov is only accessible via a VPN.
With so many data sources at hand, it’s easy to forget your initial goal. We decided to focus on candidate rating data from FiveThirtyEight, demographic data from the census, and Reuters' election results (downloaded from GitHub and Kaggle). Some of the other datasets were outdated, inconsistent, and didn't show highlights and dependencies.
Initially, we downloaded FiveThirtyEight's rating data from the source base. The difficulty was that FiveThirtyEight post-processed it. We tried to follow the described processing steps but got a different result chart. Therefore, we just took the chart coordinates from the web page code and converted them to actual values.
As we had a comparatively small amount of data, we loaded it to Power BI directly without the database's usage. To match the data, some transformations such as normalization of counties names, parties names, calculation of counties size groups, votes deltas, counties ranks based on the number of votes were required.
Pennsylvania has 67 counties. It’s too much for the pie chart categories number. Unfortunately, the Power BI pie chart can't sum up the smallest categories in one group. Using a calculated column helped us resolve such difficulty for the "Share of votes per county" chart.
We confronted the same challenge in the "Number of votes per party of each county" bar chart. In this case, we just hid the smallest values. But they are still available in the bookmark via the button "Show all".
In candidate ratings, we presented Biden's and Trump's ratings at the national and Pennsylvania levels respectively. To do so, we just separated a column per line. We added "influenced events" as a separate column to respective dates as markers. Up to now, we have found a better solution for such a case – a line and clustered column chart. This chart allows adding events as columns for better visualization of dependencies between events and rating changes.
For "Biden's and Trump's by county chart" we initially used actual values. The result contained a lot of noise, so it wasn't easy to find out trends. We left actual values as markers and added a line based on smoothed values in calculating average on previous, current, and next values.
It was a real challenge to build the map chart so it would fit the desired result. Neither a standard map nor a filled map allowed us to do it. The standard map allows adding markers, however, it’s impossible to color areas. The filled map was closer to the requirements, but it also had several disadvantages.
For instance, roads and rivers, which are not informative for our investigation, can't be hidden. Borders between counties are not notable. The colors for the infographic were too pale to differentiate between close values. We were ready to go into the sea of custom visuals when we found out that Microsoft hid genuine treasure, one more map chart into preview features: a shape map. It was a love first sight. That visual allowed us to reach the required look. Plus, it supports loading custom maps. The map of Pennsylvania can be found on topojson github. The only problem is that the shape map doesn't have a legend. But we have static ranges, so a matrix with conditional formatting based on the generated table served as our legend.
The final report was published to Power BI services and the web to embed it to the article.
Based on the analyzed data, we concluded that:
1. The number of voters increased by almost 1.1M (7.3%). It turns like in comparison with 2016, in 2020, more people took an interest in politics and decided to vote.
2. Generally, the 16 largest counties in terms of the population determine how much margin the Democratic Party gets. If the Dems' margin in this county is large enough (more than 10% on average), it overlaps the Republican margin in the remaining smaller counties and determines the victory of the Democrats in Pennsylvania. This fact means that, first of all, you need to analyze the trends in these counties.
Next, we analyzed the average income, education, urban population, and ethnic structure in these counties but we couldn’t manage to find a strict pattern.
However, depending on the population structure, we have found that counties with a large share of the people of color get more Democratic votes (e.g., Delaware, Lehigh, Dauphin, and Monroe). Thus, we assumed that the percentage for Democrats depends on the percentage of non-white people. While testing this hypothesis, we encountered exceptions in counties such as Allegheny, Bucks, and Montgomery. The percentage of Biden's voters in these counties was much higher than the hypothesis of dependence on non-white voters would explain. Besides, those counties are large, and so is the share of the white population. Therefore, it’s fair to conclude that in large counties, the percentage of votes for Biden may be higher in each ethnic group than in medium and small counties.
The graph above shows that some Pennsylvania counties have higher Republican support than expected (Luzerne, Cumberland, Berks, Yorks, and Lancaster). These counties are located in the same region and have approximately the same economic structure. This means that some unidentified local peculiarities determine the outcome in this region. Since the fact that much larger counties level the margin of votes in these counties, we will not go into detail and study additional factors in the region.
As a result, we managed to build a model that explains the voting results in the sixteen largest counties with a difference of up to 10%.
The number of votes for Democrats (Q) can be calculated as the sum of the number of voters products from each ethnic group (white, black, Hispanic, and Asian Americans) in the county (Ni) relative to the constant turnout values in this group Ti and the percentage of supporting Democrats in this group Di. Here’s the formula:
Q= Σ ni(TiDiNi)
In addition, there’s a dependency in values Ti and Di depending on a county’s size:
For counties under 400.000 in population:
1. Tw range from 57%-60%, Dw - 40-44%
2. Tb range from 72-76%, Db - 81-84%
3. Th range from 75-80%, Dh - 52-57%
4. Ta range from 68% - 72%, Da - 57-62%
For counties under 600.000 in population:
1. Tw range from 53%-56%, Dw - 70-73%
2. Tb range from 64-68%, Db - 85-88%
3. Th range from 68-72%, Dh - 5-57%
4. Ta range from 60-63%, Da - 59-64%
These data correlate with the fact that depending on the states and ethnic groups, each party's percentage of support slightly changes over several electoral cycles (here's the link to the research).
As we noted earlier, we cannot confidently establish these parameters' probable values for counties within the 400-600k range since these counties are located in the same region, and there is a significant influence of unique local factors. Additionally, we noted that Lackawanna county has an abnormal percentage of Biden votes among other counties. We see the rationale behind the fact that Lackawanna is the home county for Joe Biden as he was born and raised in Scranton.
According to the Census Bureau, the share of people of color in Pennsylvania has increased by 2.6% since 2013. Plus, the share of the population in the 16 largest counties has grown by 0.5% between 2016 and 2020.Thus, we may conclude that the current demographic changes in Pennsylvania are leading to the fact that in the coming years, Democrats will have an electoral advantage due to the growing proportion of citizens who traditionally vote for them.
To Sum Up
So what is the eventual outcome? Let's conclude. The current demographic trends make the number of Democrats voters grow each year. Otherwise, Republicans have the opportunity to win in Pennsylvania if they can present a strong candidate who will unite several electoral groups and mobilize more people to participate in the election. Also, Republicans have another way to win: in case the Democratic Party cannot attract a significant part of its voters to the polls.
Like the article? Spread the word
Akveo is an experienced team of full-stack software experts passionate about creating reliable software and ready to accept the next tech challenge. Our expertise lets us understand the essence of our clients' business needs to deliver the best solution possible. Plus, the use of our own products in development and design allows us to reduce development time and implement new solutions faster. Check what our customers say and contact us.