Prediction with Differential Covariate Classification: Illustrated by Racial/Ethnic Classification in Medical Risk Assessment
A common practice in evidence-based decision-making uses estimates of conditional probabilities P(y|x) obtained from research studies to predict outcomes y on the basis of observed covariates x. Given this information, decisions are then based on the predicted outcomes. Researchers commonly assume that the predictors used in the generation of the evidence are the same as those used in applying the evidence: i.e., the meaning of x in the two circumstances is the same. This may not be the case in real-world settings. Across a wide-range of settings, ranging from clinical practice to education policy, demographic attributes (e.g., age, race, ethnicity) are often classified differently in research studies than in decision settings. This paper studies identification in such settings. We propose a formal framework for prediction with what we term differential covariate classification (DCC). Using this framework, we analyze partial identification of probabilistic predictions and assess how various assumptions influence the identification regions. We apply the findings to a range of settings, focusing mainly on differential classification of individuals' race and ethnicity in clinical medicine. We find that bounds on P(y|x) can be wide, and the information needed to narrow them available only in special cases. These findings highlight an important problem in using evidence in decision making, a problem that has not yet been fully appreciated in debates on classification in public policy and medicine.