Parameter Recovery Using Remotely Sensed Variables
Remotely sensed measurements and other machine learning predictions are increasingly used in place of direct observations in empirical analyses. Errors in such measures may bias parameter estimation, but it remains unclear how large such biases are or how to correct for them. We leverage a new benchmark dataset providing co-located ground truth observations and remotely sensed measurements for multiple variables across the contiguous U.S. to show that the common practice of using remotely sensed measurements without correction leads to biased parameter point estimates and standard errors across a diversity of empirical settings. More than three-quarters of the 95% confidence intervals we estimate using remotely sensed measurements do not contain the true coefficient of interest. These biases result from both classical measurement error and more structured measurement error, which we find is common in machine learning based remotely sensed measurements. We show that multiple imputation, a standard statistical imputation technique so far untested in this setting, effectively reduces bias and improves statistical coverage with only minor reductions in power in both simple linear regression and panel fixed effects frameworks. Our results demonstrate that multiple imputation is a generalizable and easily implementable method for correcting parameter estimates relying on remotely sensed variables.