Iatrogenic Specification Error: A Cautionary Tale of Cleaning Data
It is common in empirical research to use what appear to be sensible rules of thumb for cleaning data. Measurement error is often the justification for removing (trimming) or recoding (winsorizing) observations whose values lie outside a specified range. This paper considers identification in a linear model when the dependent variable is mismeasured. The results examine the common practice of trimming and winsorizing to address the identification failure. In contrast to the physical and laboratory sciences, measurement error in social science data is likely to be more complex than simply additive white noise. We consider a general measurement error process which nests many processes including the additive white noise process and a contaminated sampling process. Analytic results are only tractable under strong distributional assumptions, but demonstrate that winsorizing and trimming are only solutions for a particular class of measurement error processes. Indeed, trimming and winsorizing may induce or exacerbate bias. We term this source of bias Iatrogenic' (or econometrician induced) error. The identification results for the general error process highlight other approaches which are more robust to distributional assumptions. Monte Carlo simulations demonstrate the fragility of trimming and winsorizing as solutions to measurement error in the dependent variable.
Published Versions
Bollinger, Christopher R. and Amitabh Chandra. "Iatrogenic Specification Error: A Cautionary Tale Of Cleaning Data," Journal of Labor Economics, 2005, v23(2,Apr), 235-257.