A Machine Learning Approach to Improving Occupational Income Scores

Martin Saavedra, Oberlin College
Tate Twinam, University of Washington, Seattle

Historical studies of labor markets frequently suffer from a lack of data on individual income. The occupational income score (OCCSCORE) is often used as an alternative measure of labor market outcomes. We consider the consequences of using OCCSCORE when researchers are interested in earnings regressions. Using modern Census data, we find that the use of OCCSCORE biases results towards zero and can result in statistically significant coefficients of the wrong sign. We use a machine learning approach to construct a new adjusted score based on industry, occupation, and demographics. Our alternative score reduces bias and errors of sign in both modern and historical contexts. We illustrate our approach by estimating earnings gaps in the 1915 Iowa State Census and intergenerational mobility elasticities using linked data from the 1850-1930 Censuses.

See paper

 Presented in Session 39. Problems with Data and Measurement