Using the Zooniverse Project Builder for Rapid Deployment of Historical Transcription Projects

Evan Roberts, University of Minnesota

Predictions that accurate machine recognition of human handwriting is imminent have been made since the 1960s, but real progress has been slower. Machine recognition of handwriting has proved relatively accurate and cost-effective only under strict format and content restrictions. Unfortunately, many cultural heritage materials do not conform to these machine-friendly formats. They frequently include irregular handwritten text, such as letters, diaries, memos, notes, and manuscripts of literary and scholarly works that may or may not be eventually printed. Even materials that are currently produced in a format amenable to machine recognition of text—such as questionnaires, bureaucratic forms, and surveys—in the past came in a more irregular format. While computers can only recognize handwriting accurately under restrictive conditions, people can decipher diverse, messy, and complex handwriting after being exposed to it for a relatively short amount of time. Handwritten documents can therefore be converted into a machine-readable format with the help of human labor. But funding for transcription of handwritten documents is limited, and the corpus of material is vast. “Crowdsourcing” and “citizen science” approaches, which rely on the participation of many volunteers, have therefore become an important method for researchers, educators, and cultural institutions seeking to transcribe large volumes of handwritten material. In this presentation I show how the Zooniverse Project Builder, a free tool that allows people to build projects with up to 10,000 images, can be used to build historical transcription projects rapidly, and distribute data entry easily. Like most other Zooniverse projects, it currently offers the multiple-transcriber model only. The software is designed to break down transcription workflows into smaller tasks. Transcribers use intuitive drawing tools to draw lines, boxes or other shapes to mark text blocks that can then be transcribed. Results are exported as .csv files.

No extended abstract or paper available

 Presented in Session 58. Transcription and Data Capture