High Dimensional History

Laura Nelson, Northeastern University

Comparative historical sociologists use data to explain complex historical processes as clearly and precisely as possible. From Weber, to Marx, to Tilly, combining quantitative data with qualitative analysis has been a staple of our craft. Current technologies, however, have enabled the use of a much more complex, contextual, and versatile type of data in comparative historical sociology: high dimensional data. High dimensional data is data with a large number of characteristics or attributes, including texts, images, geo-spacial data, administrative data, and/or long-form surveys, among others. This type of data is most usefully mathematically represented as vectors. Just as regression analysis standardized the use of numbers across many fields of social science, I argue that vector space models have the potential to normalize and standardize the use of high dimensional data of all forms across social science. In particular, I argue that vector space models allow comparative historical sociologists to incorporate many different types of historical data (text and images, images and administrative data, images and geo-spatial, etc.) into the same mathematical framework. This framework allows us to model history in a much more complex and thus more accurate way, while, at the same time, maintaining the empirical clarity that low-dimensional data provide. I demonstrate this claim by using three different empirical vector space models from my own research: word vectors to measure claims-making strategies, discursive vectors to measure organizational similarity, and image vectors to measure neighborhood identities. I end with a few words of caution about how to avoid the curse of high dimensions, while emphasizing the great opportunities this mathematical model provides.

No extended abstract or paper available

 Presented in Session 109. The Future of Comparative-Historical Social Science I: Scholarly Borderlands