Should We Trust Clustered Standard Errors? A Comparison with Randomization-Based Methods
We compare the precision of critical values obtained under conventional sampling-based methods with those obtained using sample order statics computed through draws from a randomized counterfactual based on the null hypothesis. When based on a small number of draws (200), critical values in the extreme left and right tail (0.005 and 0.995) contain a small bias toward failing to reject the null hypothesis which quickly dissipates with additional draws. The precision of randomization-based critical values compares favorably with conventional sampling-based critical values when the number of draws is approximately 7 times the sample size for a basic OLS model using homoskedastic data, but considerably less in models based on clustered standard errors, or the classic Differences-in-Differences. Randomization-based methods dramatically outperform conventional methods for treatment effects in Differences-in-Differences specifications with unbalanced panels and a small number of treated groups.