The Geodemographics of British Politics

Python Data Science Human Dynamics ENVS615
March 9, 2020

ENVS615. Analysis of Human Dynamics:
The Geodemographics of British Politics.

0. Introduction

0.1 Our Task: Data Science using Python

In [1]:
print('Hello World') # start using python (https://www.python.org/)
Hello World

Fifty years after Tukey (1962) declared that data analysis should not be considered merely an application of mathematical statistics, but rather a true scientific domain in its own right, talk of 'data science' has suddenly become ubiquitous (Donoho, 2017). In a much-repeated headline, the Harvard Business Review called it 'the sexiest job of the 21st century' (Davenport and Patil, 2012). Many observe that we are in the midst of a 'data revolution' (Kitchin, 2014).

For this assignment we will be using Python (Van Rossum and Drake, 1995), which is now acknowledged as a mature scientific computational ecosystem (Walt 2019), thanks in no small measure to the work of Oliphant (2006) on NumPy, and McKinney (2010) on pandas.

In [2]:
from IPython.display import Image, display, HTML # display images in Jupyter Notebook
# Because cartoons are 'a uniquely effective visual medium for orienting social issues " (Abraham 2009)
import os.path # help python navigate file system regardless of operating system (https://docs.python.org/3/library/os.path.html)

Image(os.path.join('images','data.png')) 
Out[2]:
In [3]:
import numpy as np # scientific numerical computing (https://numpy.org/)

import pandas as pd    # fast and efficient data manipulation (https://pandas.pydata.org/)

import matplotlib.pyplot as plt # visualization (https://matplotlib.org/)
import matplotlib.patches as mpatches
from matplotlib.lines import Line2D

import seaborn as sns # more visualization (https://seaborn.pydata.org/)

import geopandas as gpd # geospatial extension for pandas (https://geopandas.org/)
import geoplot as gplt # geospatial extension for matplotlib (https://residentmario.github.io/geoplot/index.html)
import geoplot.crs as gcrs # for different map projections (https://residentmario.github.io/geoplot/user_guide/Working_with_Projections.html)

import requests # pythonic HTTP library (https://pypi.org/project/requests/)
import io # tools for working with raw file streams (https://docs.python.org/3/library/io.html)
import zipfile # download zip file
# https://stackoverflow.com/questions/44575251/reading-multiple-files-contained-in-a-zip-file-with-pandas
# https://stackoverflow.com/questions/9419162/download-returned-zip-file-from-url

import warnings # https://docs.python.org/3/library/warnings.html
warnings.filterwarnings("ignore") # silence warning for final runthrough before exporting as HTML

0.2 My Focus: Geodemographics

This course is part of an integrated M.Sc. and Ph.D, for which my thesis centers on the field of geodemographics (for a thorough account, see Webber and Burrows, 2018; or Harris et al, 2005). Geodemographics is defined variously depending on whether the writer is focussing on its front-end application (eg. Sleight, 1997: the 'analysis of people by where they live') or its back-end development.

In terms of its development, it is a textbook example of unsupervised machine learning, essentially referring to the algorithmic classification of geographical neighbourhoods by means of computational clustering (cf. Spielman and Folch 2015, p.153).

0.3 An Apt Example: British Politics

It strikes me that investigating the subject of British parliamentary politics might be the perfect project for a first foray into the world of geodemographic analysis, for three reasons.

First, the British parliamentary system is, one might say, geodemographic by its very nature, in that Members of Parliament are not appointed in proportion to the national percentage of the electorate by whom they have been elected, but rather on the basis of the first past the post system, in which whoever wins the most votes (however few) in any particular parliamentary constituency, wins the privilege of representing all the members of that constituency. British politics is thus not about which party wins the most votes, but instead the most constituencies -- or technically the most seats in Parliament; since 1948 (Wikipedia, 2019), these have been equivalent, but before that some constituencies had multiple seats.

Second, whereas once economics seemed obviously the driving force in politics, in the last decade geography seems the greater force. Some examples: the 'Red Wall' of Labour's northern constituencies being penetrated by the Conservatives; Britain voting to leave the European Union; the Scottish National Party coming to dominate the Scottish constituencies); and many commentators have sought demographic explanations for these shifts.

Third, Dominic Cummings, Director of the Vote Leave campaign, and now the Prime Minister's Senior Advisor, is an outspoken advocate of applying data science to the political process (Cummings 2017). Regardless of whether one would like to help (in which case see Cummings 2020) or hinder Cummings' policies, his recent record of electoral success suggests its power.

In [4]:
Image(os.path.join('images','misfits.jpg'),width=400,height=200)
Out[4]:

1. The Data

We will construct our dataset using geographic information, election results, and demographic data from a selection of sources. Most of this will be open government data, one of the three main new sources of data highlighted by Arribas-Bel (2014).

In [5]:
dataset = pd.DataFrame()

1.0 Contituency Geography

First we can get shapefiles for each parliamentary constituency from the ONS Geoportal, (2019). This means that if we want, we can easily visualize the data for the constituencies using the familiar geographic shape of the United Kingdom.

In [6]:
ukmap = gpd.read_file(os.path.join('Westminster_Parliamentary_Constituencies_December_2017_UK_BSC',\
                                'Westminster_Parliamentary_Constituencies_December_2017_UK_BSC.shp'))
pass

The shapefile includes the code and name for each constituency, so we will use these as the foundational starting point for our dataset as we gather constituency data from various sources. It also includes the geometric information to construct a spatial polygon (or in some cases a multipolygon) to describe the boundaries of each constituency.

In [7]:
dataset['id'] = ukmap['PCON17CD']
dataset['constituency_name'] = ukmap['PCON17NM']
dataset.shape # (650, 2)
pass

For visualizing political data, it may be more helpful to use a Constituency Hexagon Cartogram (as provided by Flanagan, 2017) that pictures the equivalent political weight of the 650 parliamentary constituencies of the United Kingdom by showing them as equally-sized hexagons.

In [8]:
hexagons = gpd.read_file(os.path.join('GB_Hex_Cartogram_Const','GB_Hex_Cartogram_Const.shp'))
hexagons.head()
# hexagons.plot() # it works
pass

1.1 Election Results

Thanks to the House of Commons Library, it is easy to find excellent data for the major UK elections stretching back over the last century:

  • Full 2019 election results, including vote counts for each party in each constituency, and the winning party in each constituency (Uberoi et al (2020)).

  • Collated 1918-2017 election results (Loft et al (2019)), with total votes for each constituency given separately for the Conservatives, Labour and the Liberals (now the Liberal Democrats), and jointly for the Scottish and Welsh Nationalists (ie. the SNP and Plaid Cymru) and all Others (these days, this would include both the Green Party and the Brexit Party). However, the data leaves us to work out who won in each constituency, and therefore in each election.

  • Estimates for how each constituency voted in the Brexit referendum (Dempsey ,2017), updated from Hanretty's earlier work (2016).

To understand the implications of the fine details of constituency voting numbers, we will need to compare the counts to see who won each constituency, and then each election, in order that we might perhaps be able to see more general trends.

From the 2010 General Election, each parliamentary constituency has had a consistent ONS reference code that makes cross-referencing data across the 2010, 2015, 2017, and 2019 General Elections and 2016 EU Referendum comparitively straightforward; unfortunately the lack of such a code in previous years makes it more difficult to draw conclusions about individual constituencies from earlier years.

In [9]:
# 2019 election results
url2019 = 'http://researchbriefings.files.parliament.uk/documents/CBP-8749/HoC-2019GE-results-by-constituency.csv'
election2019 = pd.read_csv(url2019)

# election2019.to_csv(os.path.join('data','election2019.csv')) # save file
In [10]:
# 1918-2017 collated general election results
URL_century_of_elections = 'http://researchbriefings.files.parliament.uk/documents/CBP-8647/1918-2017election_results.csv'

century_of_elections = pd.read_csv(URL_century_of_elections, encoding='cp1252')

# save the file
# century_of_elections.to_csv(os.path.join('data','century_of_elections.csv'))

# remove whitespace from column names
century_of_elections.rename(columns=lambda x: x.strip(), inplace=True)
# simplify column labels
century_of_elections.rename(columns=lambda x: x.replace('_votes',''), inplace=True)
In [11]:
shares = ['con_share', 'lib_share', 'lab_share', 'natSW_share', 'oth_share']
century_of_elections[shares] = century_of_elections[shares] *100

parties = ['con','lib','lab','natSW','oth']

# all vote counts should be numeric
for p in parties:
    century_of_elections[p] = pd.to_numeric(century_of_elections[p], errors='coerce')
# as should the electorate figure
century_of_elections['electorate'] = pd.to_numeric(century_of_elections['electorate'], errors='coerce')

# find winning party for each constituency
century_of_elections['winning_party'] = century_of_elections[(p for p in parties)].idxmax(axis=1)

by_election = century_of_elections.groupby('election')
In [12]:
election = {}
elections = list(century_of_elections.election.unique())
In [13]:
for e in elections:
    election[e] = century_of_elections[century_of_elections['election']==e]
    election[e]['total_seats'] = election[e].seats.sum()
    
In [14]:
elections.append('2019')
In [15]:
e2019 = election2019.rename(columns={'ons_id':'constituency_id',
                                'constituency_name' : 'constituency',
                                'region_name' : 'country/region',
                                'ld': 'lib',
                                'valid_votes' : 'turnout',})
e2019 = e2019[['constituency_id', 'constituency', 'country/region', 'con', 'lab', 'lib', 'turnout']]
e2019['natSW'] = election2019['snp'] + election2019['pc']
e2019['oth'] = election2019['brexit'] + election2019['green'] \
            + election2019['dup'] + election2019['sf'] + election2019['sdlp'] \
            + election2019['alliance'] + election2019['other']
e2019['seats'] = 1
e2019['electorate'] = election2019['electorate']
e2019['winning_party'] = e2019[(p for p in parties)].idxmax(axis=1)
for p in parties:
    e2019[f'{p}_share'] = e2019[p]/e2019['turnout'] * 100
In [16]:
e2019['turnout'] = election2019['valid_votes'] / election2019['electorate']
In [17]:
election['2019'] = e2019
In [18]:
summary = {'election':[],
          'con':[],
          'lab':[],
          'lib':[],
          'natSW':[],
          'oth':[],
          'total':[],
          'majority':[],
          'con_votes':[],
           'lab_votes':[],
           'lib_votes':[],
           'natSW_votes':[],
           'oth_votes':[],
          'electorate':[]}
In [19]:
for e in elections:
    summary['election'].append(e)
    total = 0
    announce = {'most_seats':'unknown',
               'num_seats':0}

    summary['electorate'].append(election[e]['electorate'].replace(' ',0).dropna().astype(int).sum())
    
    for p in parties:
        
        data = election[e]
        score = data.loc[data['winning_party']==p,['seats']].sum()
        score = data.loc[data['winning_party']==p,['seats']].sum()
        summary[p].append(score[0])
        summary[p + '_votes'].append(election[e][p].replace(' ',0).dropna().astype(int).sum())
        total += score
        
        
        if score[0] > announce['num_seats']:
            announce['most_seats'] = p
            announce['num_seats'] = score[0]

    if announce['num_seats'] > total[0]/2:
        summary['majority'].append(announce['most_seats'])
    else:
        summary['majority'].append('no majority')        

    summary['total'].append(total[0])  
In [20]:
summarized = pd.DataFrame(summary)

summarized['total_votes'] = summarized[['con_votes','lab_votes','lib_votes','natSW_votes','oth_votes']].sum(axis=1)
summarized['for majority'] = summarized['total']/2
summarized['no_vote'] = (1- summarized['total_votes']/summarized['electorate']) * 100
for p in parties:
    # show percentage of total_votes
    summarized[p+'_share'] = summarized[p+'_votes']/summarized['total_votes'] *100
    # show percentage of total electorate
    summarized[p+'_of_total'] = summarized[p+'_votes']/summarized['electorate'] *100

    
summarized.set_index("election")
summarized.iloc[::-1].set_index("election")

sumT = summarized.iloc[::-1].set_index("election").transpose()
sumT
pass
In [21]:
# timeline

colours = {'con':'#0087dc',
         'lab':'#d50000',
         'lib':'#fdbb30',
         'natSW':'#3f8428',
         'oth':'grey'}

parties = ['con','lab','lib','natSW','oth',]

f, ax = plt.subplots(figsize=(20,6))

ax2 = ax.twinx()
ax.set_ylim(-10,110)
ax2.set_ylim(-50,650)

x = summarized['election']
floor = 0


for p in parties:
    y = summarized[p+'_share']
    ax.bar(x, y, bottom = floor, color=colours[p], width=-0.1, align='edge')
    floor += y


for p in parties:
    x = summarized['election']
    ax2.plot(x, summarized[p], c= colours[p])
    ax2.scatter(x, summarized[p], c= colours[p], edgecolors='black', s=200)

line0 = np.repeat(50,28)
line1 = summarized['for majority']
ax.plot(summarized['election'], line0, linestyle='dashed', c='grey')
ax2.plot(summarized['election'], line1, linestyle='dashed', c='black')

x = summarized['election']
labels = summarized['election'].unique()
plt.xticks(x, labels, rotation=90)

patch = {}
for p in parties:
    patch[p] = mpatches.Patch(color=colours[p], label=p.capitalize().replace('sw','SW'))


plt.legend(handles=[patch['con'],
                    patch['lab'],
                    patch['lib'],
                    patch['natSW'],
                    patch['oth']],
                    title="Key",
                    loc=[0.39,0.7], fancybox=True
                    )

custom_lines = [Line2D([0], [0], color='grey',linestyle='dashed', lw=4),
                Line2D([0], [0], color='black',linestyle='dashed', lw=4)]
ax.legend(custom_lines, ['50% of Total Vote', '50% of Constituencies'], title="Dotted Lines",
                    loc=[0.5,0.3], fontsize='small', fancybox=True)

f.suptitle('Figure 1: Timeline visualizing 101 Years of General Election Results',
           fontsize=18,
          y=-0.05)

f.tight_layout()

ax.set_xlabel('General Election', fontsize=15)
ax.set_ylabel('Percentage of Total Vote (Stacked Bars)', fontsize=15)

ax2.set_ylabel('Number of Constituencies (Connected Circles)', fontsize=15)

plt.show()
In [22]:
cols = ['majority']
for p in parties:
    cols.append(p)
    cols.append(f'{p}_share')
In [23]:
display(HTML('<strong>Table 1: Constituencies Won and Percentage Vote Share, by Main Parties and General Elections (1918-2019)</strong>'))
sumT.T[cols].round(decimals=2) # table of election results
Table 1: Constituencies Won and Percentage Vote Share, by Main Parties and General Elections (1918-2019)
Out[23]:
majorityconcon_sharelablab_shareliblib_sharenatSWnatSW_shareothoth_share
election
2019con36543.753220232.17021111.5799524.37219208.12449
2017no majority31742.344426239.9883127.36507393.54623206.75591
2015con33036.809523130.449687.87007595.329882219.5409
2010no majority30636.054325828.99035723.027392.2123209.71586
2005lab19832.361535535.18466222.047392.1625228.24407
2001lab16631.696841240.67515218.258692.50388206.86563
1997lab16530.911141843.56774616.8975102.52218206.10146
1992con33641.935427134.38672017.932772.33969173.40555
1987con37642.313522930.83952222.575661.66072172.61066
1983con39742.426520927.57372325.36841.49094173.14085
1979con33943.891726936.95371113.82342.04054123.29103
1974Olab27735.847131939.24611318.3153143.4458123.14569
1974Fno majority29637.778530137.16171419.343792.56638153.14965
1970con33046.37628742.96967.4682411.6998661.48697
1966lab25341.880236347.9229128.5365600.69520620.96515
1964lab30443.405131744.1265911.205100.48283900.780433
1959con36549.351825843.844765.8887900.35642610.558228
1955con34449.648127746.357362.6995500.21386731.08117
1951no majority29443.971128948.117162.5547100.0637115365.2933
1950lab28040.022331446.105999.1170300224.75485
1945lab19236.195939047.7565129.0448300467.00277
1935con36347.738915038.0179206.7335400827.5097
1931con41855.04524129.5058327.003900.09720361248.34796
1929no majority26138.049328637.16685623.5700121.21388
1924con39546.679914933.43182817.771300432.11707
1923no majority22938.016919030.788314429.579500521.61532
1922con31238.601614129.71225418.97780010812.7084
1918no majority34638.5724821.31342913.0220028427.0927

We begin by zooming out from the detail of individual constituency voting patterns to see how those constituency's votes have translated into national political power (Fig.1 Timeline).

As well as showing us the changing fortunes of the British political parties through the last century, this visualization shows the lack of straightforward correlation between what proportion of the electorate vote for a party, and whether they are able to win enough seats to form a majority government.

And so we see the political significance of geodemographics!

Consider for example the plight of the Liberal Democrats, who in 2010 had a share of the total vote (23.0%) that was drawing close to Labour's 28.9% -- indeed closer than Labour were to the Conservatives' 36.0%. But this was not at all reflected in their share of seats: 57, compared to Labour's 258.

In [24]:
# brexit results by constituency
brexit_url = 'https://commonslibrary.parliament.uk/wp-content/uploads/2017/02/eureferendum_constitunecy.xlsx'
brexit_referendum = pd.read_excel(brexit_url, sheet_name=1, header=5)

brexit_referendum.drop([0,1],axis=0,inplace=True)
In [25]:
renamed = brexit_referendum.rename(columns={'ONS ID':'constituency_id',
                                           'TO USE':'leave_vote'})[['constituency_id','leave_vote']]
In [26]:
renamed.loc[renamed.leave_vote > 0.5, 'winning_party'] = 'leave'
renamed.loc[renamed.leave_vote < 0.5, 'winning_party'] = 'remain'
In [27]:
election['Brexit'] = renamed
In [28]:
parties = ['con','lab','lib','natSW','oth','leave','remain']

# maps
hexagons.rename(columns={'CODE':'constituency_id'},inplace=True)
hex={}
recent = ['2010','2015','Brexit','2017','2019']
for e in recent:
    hex[e] = hexagons.merge(election[e], on='constituency_id')

plot = {}
for e in recent:
    plot[e] = {}   

for e in recent:
    for p in parties:
        plot[e][p] = hex[e][(hex[e]['winning_party']==p)]   
    
In [29]:
colours['leave'] = '#12b6cf'
colours['remain'] = '#fb5353'

parties = ['con','lab','lib','natSW','oth','leave','remain']

f, ax = plt.subplots(1,len(recent), figsize=(15,5))

for i,e in enumerate(recent):
    for p in parties:
        plot[e][p].plot(color = colours[p], ax=ax[i])
        ax[i].set_facecolor('#e0f8f8')
        ax[i].get_xaxis().set_ticks([])
        ax[i].get_yaxis().set_ticks([])
    if e == 'Brexit': e = 'EU Referendum (2016)'
    else: e = f'General Election {e}'
    ax[i].set_title(e)

patch = {}
for p in parties:
    patch[p] = mpatches.Patch(color=colours[p], label=p.capitalize().replace('sw','SW'))
    
plt.legend(handles=[patch['con'],
                    patch['lab'],
                    patch['lib'],
                    patch['natSW'],
                    patch['oth'],
                    patch['remain'],
                   patch['leave']],
           bbox_to_anchor=(1.7, 1)
                    )

f.suptitle('Figure 2: Constituency Cartograms visualizing the last Decade\'s Election Results',
            fontsize=16,
            y=0.1)
f.tight_layout()
plt.show()