Panel Event Studies (Difference-in-Differences)
A panel event study combines event study methodology with Difference-in-Differences estimation to measure causal effects when a policy affects multiple units at different times. It uses untreated units as controls, with estimators like Sun & Abraham and Callaway-Sant'Anna for staggered designs.
Part of the methodology guide
Part of the Event Study Methodology Guide.
Panel event studies combine the event study framework with Difference-in-Differences (DiD) estimation. Use this approach when a policy or regulation affects multiple units at different times and you have a control group for comparison.
Panel event studies can handle datasets with 100 or more treatment cohorts entering at different times. According to Bertrand, Duflo, and Mullainathan (2004), ignoring serial correlation in panel settings leads to over-rejection of the null hypothesis in 45% of cases at the 5% level. Cluster-robust standard errors and wild bootstrap inference are essential for valid panel event study results.
When Should I Use Panel vs. Cross-Sectional Event Studies?
A panel event study is a causal inference framework that combines event study methodology with Difference-in-Differences (DiD) estimation to measure the effect of a policy or intervention across multiple units treated at potentially different times. Unlike cross-sectional event studies that rely on expected return models, panel event studies use untreated units as an explicit control group and identify effects through the parallel trends assumption. Since 2021, at least 6 heterogeneity-robust DiD estimators have been developed to correct the bias inherent in traditional Two-Way Fixed Effects regression under staggered treatment timing.
| Feature | Cross-Sectional Event Study | Panel Event Study (DiD) |
|---|---|---|
| Treatment timing | All firms on the same date | Firms treated at different dates |
| Control group | Market index | Untreated firms |
| Identification | Expected return model | Parallel trends assumption |
| Best for | Market reactions to announcements | Causal effect of policies/regulations |
How Do the Panel DiD Estimators Compare?
| Estimator | R Method | Staggered Treatment | Heterogeneous Effects | Output | Required Package |
|---|---|---|---|---|---|
| Static TWFE | "static_twfe" | Biased | Biased | Single ATT estimate | base R |
| Dynamic TWFE | "dynamic_twfe" | Biased | Biased | Event-time coefficients | base R |
| Sun & Abraham | "sun_abraham" | Unbiased | Unbiased | Event-time coefficients | base R |
| Callaway & Sant'Anna | "callaway_santanna" | Unbiased | Unbiased | Group-time ATTs | did |
| de Chaisemartin & D'Haultfoeuille | "dechaisemartin_dhaultfoeuille" | Unbiased | Unbiased | Switchers ATT | DIDmultiplegt |
| Borusyak, Jaravel & Spiess | "borusyak_jaravel_spiess" | Unbiased | Unbiased | Imputed event-time coefficients | didimputation |
Default recommendation
Use Sun & Abraham when treatment is staggered across units and you want event-time dynamics with no extra dependencies. Use Callaway & Sant'Anna for flexible group-time ATTs with built-in aggregation. Use Dynamic TWFE only when all units are treated at the same time.
What Data Format Does Panel DiD Require?
Panel data must be in long format with one row per unit-period:
# Required columns (names are configurable)
panel_data <- tibble(
unit_id = ..., # Firm/unit identifier
time_id = ..., # Time period (integer)
outcome = ..., # Outcome variable (e.g., stock return, revenue)
treated = ..., # Treatment indicator (0/1)
treatment_time = ... # Period when unit first treated (NA for controls)
)Static TWFE
The Two-Way Fixed Effects estimator, the most commonly used panel DiD specification since the 1990s, includes unit and time fixed effects to estimate a single average treatment effect:
- : unit fixed effects (absorb time-invariant differences)
- : time fixed effects (absorb common shocks)
- : treatment indicator (1 if unit is treated at time )
- : average treatment effect on the treated (ATT)
panel_task <- PanelEventStudyTask$new(
panel_data,
unit_id = "unit_id",
time_id = "time_id",
treatment = "treated",
outcome = "outcome"
)
result <- estimate_panel_event_study(panel_task, method = "static_twfe")
result$results$coefficients| Pros | Cons |
|---|---|
| Single interpretable ATT | Biased with staggered treatment |
| Familiar regression framework | Misses treatment dynamics |
| Simple to estimate | Assumes homogeneous effects |
Dynamic TWFE
The dynamic specification replaces the single treatment indicator with event-time dummies, allowing you to trace the treatment effect over time and test parallel trends:
- where is the treatment date for unit
- Period is omitted as the reference period
- Pre-treatment coefficients ( for ) test parallel trends
- Post-treatment coefficients ( for ) trace the dynamic effect
result <- estimate_panel_event_study(
panel_task,
method = "dynamic_twfe",
leads = 5,
lags = 5,
base_period = -1
)
plot_panel_event_study(result)Pre-treatment coefficients near zero confirm parallel trends. The jump at period 0 shows the treatment effect.
| Pros | Cons |
|---|---|
| Tests parallel trends visually | Biased with staggered treatment timing |
| Shows treatment dynamics | "Forbidden comparisons" between cohorts |
| Identifies temporary vs. permanent effects | Requires sufficient pre-treatment periods |
Sun & Abraham (2021)
When treatment is staggered (units treated at different times), standard TWFE produces biased estimates because it makes "forbidden comparisons" — using already-treated units as controls for newly-treated units. As shown by Goodman-Bacon (2021), the TWFE estimator is a weighted average of all possible 2x2 DiD comparisons, and some weights can be negative, leading to sign reversal of the estimated treatment effect. The Sun & Abraham interaction-weighted estimator resolves this by estimating cohort-specific effects and aggregating properly.
staggered_task <- PanelEventStudyTask$new(staggered_data)
result <- estimate_panel_event_study(
staggered_task,
method = "sun_abraham",
leads = 5,
lags = 5
)
plot_panel_event_study(result)
result$results$coefficients| Pros | Cons |
|---|---|
| Unbiased under staggered treatment | Requires a never-treated or last-treated group |
| Handles heterogeneous effects across cohorts | Less efficient than TWFE when effects are homogeneous |
| Interpretable cohort-specific estimates | Requires more data than static TWFE |
Modern DiD Estimators
Callaway & Sant'Anna (2021)
Published in the Journal of Econometrics in 2021 and cited over 5,000 times, this estimator computes group-time average treatment effects — the treatment effect for cohort (units first treated at time ) at time . These can be flexibly aggregated into event-time, group, or calendar-time summaries.
The key insight: by estimating effects separately for each cohort, the estimator avoids the "forbidden comparisons" that bias TWFE. Uses propensity score or outcome regression for doubly robust estimation.
result <- estimate_panel_event_study(
staggered_task,
method = "callaway_santanna",
leads = 5,
lags = 5
)
plot_panel_event_study(result)| Pros | Cons |
|---|---|
| Doubly robust (propensity score + outcome regression) | Requires did package |
| Flexible aggregation of group-time ATTs | More complex output to interpret |
| Built-in bootstrap inference | Requires sufficient group sizes |
| Handles covariates naturally | Slower than Sun & Abraham for large panels |
de Chaisemartin & D'Haultfoeuille (2020)
Focuses on switchers — units that change treatment status. Estimates the average effect on units that switch into treatment, using units whose treatment status does not change as controls. Particularly useful when treatment can turn on and off.
result <- estimate_panel_event_study(
staggered_task,
method = "dechaisemartin_dhaultfoeuille",
leads = 5,
lags = 5
)
result$results$coefficients| Pros | Cons |
|---|---|
| Handles treatment reversals (on/off) | Requires DIDmultiplegt package |
| Intuitive interpretation via switchers | Computationally intensive for large panels |
| Robust to heterogeneous effects | Requires careful specification of placebos |
Borusyak, Jaravel & Spiess (2024)
An imputation estimator that first estimates unit and time fixed effects using only untreated observations, then imputes counterfactual outcomes for treated units. The treatment effect is the difference between observed and imputed outcomes.
result <- estimate_panel_event_study(
staggered_task,
method = "borusyak_jaravel_spiess",
leads = 5,
lags = 5
)
plot_panel_event_study(result)| Pros | Cons |
|---|---|
| Efficient — uses all untreated observations | Requires didimputation package |
| Clean event-time coefficient estimates | Assumes treatment effect homogeneity across cohorts for efficiency |
| Simple imputation logic | Requires sufficient untreated observations |
| Fast even for large panels | Cannot handle treatment reversals |
Which Estimator Should I Use?
| Question | Answer | Estimator |
|---|---|---|
| All units treated at the same time? | Yes, no event-time dynamics needed | Static TWFE |
| All units treated at the same time? | Yes, need event-time dynamics | Dynamic TWFE |
| Staggered treatment, treatment reversals? | Yes | de Chaisemartin & D'Haultfoeuille |
| Staggered treatment, simplicity priority? | Yes, no extra deps | Sun & Abraham |
| Staggered treatment, flexible aggregation? | Yes | Callaway & Sant'Anna |
| Staggered treatment, efficiency priority? | Yes | Borusyak, Jaravel & Spiess |
- ATT (Average Treatment Effect on the Treated)
- The mean causal effect of the intervention on units that actually received treatment, the primary estimand in panel event studies.
- Staggered Treatment
- A design where different units receive treatment at different times, requiring heterogeneity-robust estimators to avoid bias from “forbidden comparisons” in standard TWFE.
- Parallel Trends
- The identifying assumption that treated and control units would have followed the same outcome trajectory in the absence of treatment, testable via pre-treatment coefficient plots.
How Do Cluster-Robust Standard Errors Work?
All estimators compute cluster-robust standard errors via sandwich::vcovCL(). By default, clustering is at the unit level:
# Default: clustered at unit level
result <- estimate_panel_event_study(
panel_task, method = "sun_abraham", leads = 5, lags = 5
)
# Custom clustering variable
result <- estimate_panel_event_study(
panel_task, method = "sun_abraham",
leads = 5, lags = 5, cluster = "state_id"
)Cluster at the treatment level
If treatment varies at the state level but your units are firms within states, cluster at the state level to account for within-state correlation.
What Diagnostic Checks Should I Run?
- Parallel trends: Pre-treatment coefficients should be jointly zero — flat and insignificant before treatment
- Visual inspection: Plot dynamic coefficients with confidence intervals; look for pre-trends
- Placebo tests: Shift treatment timing earlier to check for spurious effects
- Balance checks: Verify treated and control groups have similar pre-treatment characteristics
Literature
- Sun, L. & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. Journal of Econometrics, 225(2), 175–199.
- Callaway, B. & Sant'Anna, P.H.C. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2), 200–230.
- de Chaisemartin, C. & D'Haultfoeuille, X. (2020). Two-way fixed effects estimators with heterogeneous treatment effects. American Economic Review, 110(9), 2964–2996.
- Borusyak, K., Jaravel, X. & Spiess, J. (2024). Revisiting event-study designs: Robust and efficient estimation. Review of Economic Studies, 91(6), 3253–3285.
- Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics, 225(2), 254–277.
- Miller, D.L. (2023). An Introductory Guide to Event Study Models. Journal of Economic Perspectives, 37(2), 203–230.
Implement this with the R package
Access advanced features and full customization through the EventStudy R package.
What Should I Read Next?
- Expected Return Models — cross-sectional event study models
- Synthetic Control — single-unit causal inference
- Diagnostics & Export — validate and export results
- Test Statistics — significance testing