Combining Family History and Machine Learning to Link Historical Records

Joseph P. Price, Brigham Young University
Kasey Buckles, University of Notre Dame
Isaac Riley, Brigham Young University
Jacob Van Leeuwen, Brigham Young University

The ability to confidently link individuals across US census records opens up opportunities for important social science research. We use a new approach that combines machine learning with human decisions made as part of a large, public wiki-style family tree. We also describe two illustrative examples where we link together everyone born in a particular state or with a specific surname and are able to identify over two-thirds of all possible links for these groups. We provide insights about important decisions that need to be made when linking historical records and also suggest several ways to verify the quality of links.

See extended abstract

 Presented in Session 3. Ancestry and other Big Data – Collaboration between genealogical organizations and academics