A Centralized Data Classification Platform: Exploration and Contribution of Knowledge Systems

Ashkan Ashkpour, International Institute of Social History
Ramtin Soltani, External Senior Developer
Kees Mandemakers, Erasmus University Rotterdam

Whether using more traditional methods for publishing and linking historical datasets or building on more recent efforts such as the Resource Description Framework (RDF) and Linked Open Data, the availability of standard vocabularies and classification systems is essential in order to harmonize disconnected datasets. While open data is becoming more and more considered as a best practice, open knowledge and the sharing of expert systems (and decisions) still has to gain momentum. The exposure of existing classifications in a centralized environment, brings many opportunities for researchers to share their knowledge intensive work and provide unambiguous ways for others to standardize their data without redoing the same work, often over and over again. Classification of data in various research domains is nothing novel and researchers have been creating standards and making links between datasets for a long time, often manually. The mappings to these systems have been developed for a variety of variables (municipalities, occupations, religions etc.) but unfortunately are often confined in their own domains. Although these systems are often shared in online archives, institutional websites or other open sources, they are not by definition openly available and ready for reuse. In practice researchers still need to be able to find and connect these mappings to their own data. In our approach we aim to share these expert decisions in generic ways, all in one centralized environment. In order to do so, we first gathered various mappings to significant classification systems used in the Netherlands such as AMCO, CBS Code (for municipalities), GeoNames, HISCO etc. and make these available in a centralized environment for future (re)use. In the case of AMCO, we have over 15,000 mapped spelling variants linked to Dutch municipalities. In the case of occupations, the Historical Sample of the Netherlands (HSN) provides us with over 150,000 mappings.

No extended abstract or paper available

 Presented in Session 22. Overcoming Limitations in Big Data