Ensemble Methods for Causal Effects in Panel Data Settings
In many prediction problems researchers have found that combinations of prediction methods (“ensembles”) perform better than individual methods. A simple example is random forests, which combines predictions from many regression trees.
A striking, and substantially more complex, example is the Netflix Prize competition where the winning entry combined predictions using a wide variety of conceptually very different models. In macro-economic forecasting researchers have often found that averaging predictions from different models leads to more accurate forecasts.
In this paper we apply these ideas to synthetic control type problems in panel data setting. In this setting a number of conceptually quite different methods have been developed, with some assuming correlations between units that are stable over time, others assuming stable time series patterns common to all units, and others using factor models. With data on state level GDP for 270 quarters, we focus on three basic approaches to predicting missing values, one from each of these strands of the literature. Rather than try to test the different models against each other and find a true model, we focus on combining predictions based on each of the separate models using ensemble methods. For the ensemble predictor we focus on a weighted average of the three individual methods, with non-negative weights determined through out-of-sample cross-validation.
Published Versions
Susan Athey & Mohsen Bayati & Guido Imbens & Zhaonan Qu, 2019. "Ensemble Methods for Causal Effects in Panel Data Settings," AEA Papers and Proceedings, vol 109, pages 65-70. citation courtesy of