Standard event study inference relies on asymptotic theory — assuming residuals are normal and independent. When these assumptions fail, two tools strengthen your conclusions: wild bootstrap for more accurate p-values, and multiple testing corrections to control false discovery when testing many hypotheses.
Wild Bootstrap
Why Bootstrap?
Asymptotic p-values (from t-tests) can be unreliable when:
The estimation window is short (< 60 days)
Residuals are non-normal or heavy-tailed
The sample size (number of firms) is small
There is cross-sectional dependence
The wild bootstrap generates the null distribution by resampling residuals with random sign flips, preserving the heteroskedasticity structure of the original data.
Algorithm
Compute abnormal returns \(AR_{i,t}\) and the test statistic \(T\) (e.g., CAAR t-statistic)
For \(b = 1, \ldots, B\) bootstrap replications:
Draw Rademacher weights \(\eta_i \in \{-1, +1\}\) with equal probability (or Mammen weights)
Which statistic to bootstrap: "aar", "caar", or "both"
"both"
Rademacher vs. Mammen. Rademacher weights (\(\pm 1\)) are simpler and work well in most cases. Mammen weights (\(\pm\sqrt{5}/2\)) better preserve skewness in the residuals. Use Mammen when residuals are notably skewed.
Multiple Testing Corrections
The Problem
When you test multiple event windows, subgroups, or test statistics, the probability of at least one false positive grows rapidly. With \(m\) independent tests at \(\alpha = 0.05\):
\[
P(\text{at least one false positive}) = 1 - (1 - \alpha)^m
\]
For \(m = 10\) tests, the family-wise error rate is 40%. Multiple testing corrections adjust p-values to control this inflation.
Available Methods
# Adjust p-values from multi-event test statistics for multiple testingadjusted <-adjust_p_values(task, method ="BH")adjusted
Recommendation. Use Holm (step-down Bonferroni) as the default for FWER control — it is uniformly more powerful than Bonferroni. Use Benjamini-Hochberg when testing many hypotheses (e.g., AAR significance at each event-time day) and you can tolerate a controlled rate of false discoveries.
Practical Example
Adjust p-values from different test statistics or across subgroups:
# Adjust using Holm's step-down method (more powerful than Bonferroni)adjusted_holm <-adjust_p_values(task, method ="holm")# Adjust using Benjamini-Yekutieli (for dependent tests)adjusted_by <-adjust_p_values(task, method ="BY")
When to Use Each Tool
Scenario
Recommended Approach
Short estimation window (< 60 days)
Wild bootstrap
Non-normal residuals
Wild bootstrap
Multiple event windows tested
BH or Holm correction
Subgroup comparisons
BH correction
Single pre-specified window
No correction needed
Maximum robustness
Wild bootstrap + BH correction
Literature
Cameron, A.C., Gelbach, J.B. & Miller, D.L. (2008). Bootstrap-based improvements for inference with clustered errors. Review of Economics and Statistics, 90(3), 414–427.
Benjamini, Y. & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57(1), 289–300.
Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2), 65–70.