Digitizing Handwriting with Automated Methods: A Pilot Project Using the 1990 U.S. Census Manuscripts

Trent Alexander, University of Michigan
Jonathan Fisher, Stanford University
Katie Genadek, University of Colorado Boulder

The U.S. Census Bureau maintains a large longitudinal research infrastructure that currently includes linked data from the 1940 census, the 2000-2010 censuses, major national surveys going back to 1973, and administrative records dating from the 1990s. These data are accessible to researchers around the U.S. via the the Federal Statistical Research Data Centers (FSRDC) network. The major shortcoming of this infrastructure is that it lacks linkable files from the decennial censuses of 1950 through 1990. Full-count microdata from the 1960-1990 censuses are available for research, but datasets from these years do not include respondent names and therefore have not been linked over time. Focusing on the 1990 U.S. census, we describe the results of a project to develop methods for filling this gap. We created digital images from 1990 census microfilm, hand-keyed “truth data” from those images, supported two teams’ attempts to conduct Handwriting Recognition on the images, appended recovered names to already-existing microdata files, and linked the new 1990 census microdata records to previous and subsequent censuses. We describe our processes, the accuracy of the Handwriting Recognition, and the accuracy of the record linkage with the recovered names.

No extended abstract or paper available

 Presented in Session 40. Automatic Handwriting Recognition