Sam Hwang, University of British Columbia
Arkadev Ghosh, University of British Columbia
Munir Squires, University of British Columbia
Linking historical data at scale typically requires substantial human effort and sub- jective individual judgement on the quality of links. We propose a method to identify bounds on statistics of interest that requires minimal assumptions. This method is complementary to state-of-the-art approaches to linking census records. We imple- ment our method to compute an upper and lower bound on the migration rate out of Arkansas between the 1850 and 1860 US Census. We implement this both with objective criteria for limiting possible links, and by using a machine learning model trained with limited human RA decisions. We find a lower bound of 38.2% and an upper bound of 49% on outmigration rate from Arkansas between 1850 and 1860, which is higher than existing estimates of inter-state migration in the literature. We discuss why our estimate is larger than past estimates that are typically on smaller samples. We also discuss simulations that mimic the census data to explore sensi- tivity of bounds under a set of conditions likely to be encountered in applying this method.
Presented in Session 182. Matching, Bias and Data Development: Automated Methods for Data Collection and Record Linking Assessed