Geoparsing the Nineteenth-Century UK Parliamentary Papers

Jim Clifford, University of Saskatchewan
Bea Alex, University of Edinburgh

The scale of globalization significantly increased during the industrial era, when trade, European imperialism, migration and telecommunications brought the world into much closer contact during the long nineteenth century. Extracting place names from the Sessional Papers and Command Papers in the digitized United Kingdom Parliamentary Papers using the Edinburgh Geoparser provides new insight into this process of globalization. The geoparser was, which identifies place names in the text and automatically grounds them to a longitude and latitude was specially adapted for nineteenth-century historical English text. We then performed an error check process to remove false positives from the most frequent results. The top thirty place names (excluding British place names) were found 3.3 million times in thoroughly six million pages of government documents: India (10.3%), United States (9.4%), America (5.4%), France (5.2%), Germany (4.9%), China (4.6%), Russia (3.7%), Bombay (3.4%), Africa (3.3%), Australia (3.3%), Belgium (3.3%), Europe (3.1%), Bengal (3.1%), Spain (3%), Canada (2.7%), Brazil (2.3%), East India (2.1%), Italy (2.1%), North America (2%), Portugal (2%), Egypt (2%), West Indies (1.9%), Sweden (1.8%), Calcutta (1.8%), New Zealand (1.8%), Mauritius (1.8%), Malta (1.7%) and Ceylon (1.7%). These numbers are not the total mentions of each place names in the Parliamentary Papers, as optical character recognition errors limit the effectiveness of text mining methods. This paper will use further text mining methods, including a version of term frequency-inverse document frequency (tf-idf) to investigate prominent place names in the Parliamentary Papers to better understand how the British government represented the world during the course of the nineteenth century.

No extended abstract or paper available

 Presented in Session 174. Geographies of Qualitative Sources