Constructing Multi-Census Time Series for UK Districts and Small Areas: Data Archaeology, Re-Districting and Re-Classification Methodologies

Humphrey Southall, University of Portsmouth
Paula Aucott, University of Portsmouth
Justin Hayes, University of Salford

Advanced economies have been gathering censuses for hundreds of years, and a key virtue of censuses relative to sample surveys is that they provide full geographical detail. However, most uses of censuses to study change consider only pairs of adjacent censuses, are limited to national totals, or are based on available microdata. This paper reviews the existing work of the Great Britain Historical GIS to construct time series from aggregate data, for 1841 to 2011 for Britain’s 380 local authority districts, as defined in 2011, and presents new series for the 7,201 Mid-Layer Super Output Areas (MSOAs) of England and Wales 1961-2011. MSOAs have been designed by the Office of National Statistics to be internally consistent and held constant, but ONS have released MSOA data only for 2001 and 2011. Three types of challenge arise: Firstly, even within the period when censuses were “born digital”, data archiving has been problematic. The 1961 census was the first to use digital computers, but our work has required us to develop new table-recognition extensions for OCR, data validation processes, and use the help of 2,500 online volunteers to recover, correct and integrate values from images of 1960s print-outs from contemporary microfilm back into digital data, along with the creation of a new set of digital boundaries for 1961 districts and wards/parishes. Even for 1991, matching data downloads to boundary downloads was problematic. Secondly, Britain’s reporting geographies have changed almost constantly, and Enumeration Districts re-defined for each census. Time series construction therefore requires that data be re-districted from each historical geography to one standard output geography using GIS techniques. Thirdly, the questions asked by each census differ, as do the classifications used to tabulate responses. Away from narrowly demographic questions, creating time series often requires that some differences be ignored.

No extended abstract or paper available

 Presented in Session 95. Big Data in Historical Research II