Mapping the Enumerated City: Spatial Strategies for Historical Microdata Linkage in New York 1850-1920

Daniel Miller, Columbia University

This paper introduces emerging record linkage methods central to a new project on the development of urban neighborhoods and landscapes in New York. Mapping Historical New York City is an interdisciplinary effort to bring together demographic data and archival collections into an interactive, digital mapping platform for use by scholars, teachers, and the general public. The project has developed approaches to spatial data creation within a critical framework, and this paper emphasizes both the problems and potential associated with connecting historical, spatial datasets for research on New York City. Historical record linkage processes pose considerable challenges to researchers seeking to carry out spatial analyses at the urban scale. When working with millions of demographic records without address information, record linkage can facilitate a more granular mapping of households and blocks that would not otherwise be possible. The paper will detail methods being developed by the project to link census, city directory, and insurance atlas records though spatial matching techniques. Working from historically accurate address locators, geocoded city and street directories are compared against census microdata at the enumeration district or ward level. We used a fuzzy matching process, supported by machine learning, to link the sequences and proximity of households surveyed by census enumerators to the geocoded city directory entries. When a direct match can be established and confirmed through a consideration of neighboring or proximate matches, the related address and block information is then associated with the census record. Preliminary outcomes of record linkage will be shared in the paper, which will also discuss how the results of linkage allow Mapping Historical New York City to learn more about census enumeration and city directory publication methods and test assumptions about the comprehensive and sequential components of those sources.

No extended abstract or paper available

 Presented in Session 31. Emerging Methods: Computation/Spatial Econometrics