Generic Machine Learning Inference on Heterogeneous Treatment Effects in Randomized Experiments, with an Application to Immunization in India

Victor Chernozhukov; Mert Demirer; Esther Duflo; Iván Fernández-Val

doi:10.3386/w24678

Generic Machine Learning Inference on Heterogeneous Treatment Effects in Randomized Experiments, with an Application to Immunization in India

Victor Chernozhukov, Mert Demirer, Esther Duflo & Iván Fernández-Val

Working Paper 24678

DOI 10.3386/w24678

Issue Date June 2018

Revision Date February 2023

We propose strategies to estimate and make inference on key features of heterogeneous effects in randomized experiments. These key features include best linear predictors of the effects using machine learning proxies, average effects sorted by impact groups, and average characteristics of most and least impacted units. The approach is valid in high dimensional settings, where the effects are proxied (but not necessarily consistently estimated) by predictive and causal machine learning methods. We post-process these proxies into estimates of the key features. Our approach is generic, it can be used in conjunction with penalized methods, neural networks, random forests, boosted trees, and ensemble methods, both predictive and causal. Estimation and inference are based on repeated data splitting to avoid overfitting and achieve validity. We use quantile aggregation of the results across many potential splits, in particular taking medians of p-values and medians and other quantiles of confidence intervals. We show that quantile aggregation lowers estimation risks over a single split procedure, and establish its principal inferential properties. Finally, our analysis reveals ways to build provably better machine learning proxies through causal learning: we can use the objective functions that we develop to construct the best linear predictors of the effects, to obtain better machine learning proxies in the initial step. We illustrate the use of both inferential tools and causal learners with a randomized field experiment that evaluates a combination of nudges to stimulate demand for immunization in India.

This paper was delivered (virtually) by Esther Duflo at the Fischer-Shultz Lecture of the Econometric Society World Congress, 2020. We thank the editor Guido Imbens, three anonymous referees, Susan Athey, Moshe Buchinsky, Denis Chetverikov, Carlos Cineli, Matt Hong, Stella Hong, Steven Lehrer, Siyi Luo, Max Kasy, Sylvia Klosin, Susan Murphy, Whitney Newey, Patrick Power, Victor Quintas-Martinez, Suhas Vijaykumar, and seminar participants at ASSA 2018, Barcelona GSE Summer Forum 2019, Brazilian Econometric Society Meeting 2019, BU, Lancaster, NBER summer institute 2018, NYU, UCLA, Whitney Newey’s Contributions to Econometrics conference, and York for valuable comments. Anirudh Sankar provided us with excellent research assistance. We gratefully acknowledge research support from the National Science Foundation, AFD, USAID, and 3ie. An R package that implements the methods in this paper, GenericML, is available on GitHub at https://github.com/mwelz/GenericML. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research.
Copy Citation

Victor Chernozhukov, Mert Demirer, Esther Duflo, and Iván Fernández-Val, "Generic Machine Learning Inference on Heterogeneous Treatment Effects in Randomized Experiments, with an Application to Immunization in India," NBER Working Paper 24678 (2018), https://doi.org/10.3386/w24678.

Download Citation

MARC RIS BibTeΧ
- June 4, 2018
- December 28, 2020

Generic Machine Learning Inference on Heterogeneous Treatment Effects in Randomized Experiments, with an Application to Immunization in India

Related

Topics

Programs

Working Groups

More from the NBER