Choosing the Right Test Statistic
The choice of test statistic determines whether your event study produces valid inference. Different tests make different assumptions about the distribution of abnormal returns, the behavior of variance around the event, and the independence of events across firms. Using the wrong test can lead to over-rejection (finding effects that are not there) or under-rejection (missing real effects). This page provides a systematic decision framework for selecting the right test.
Part of the Methodology Guide
This page is part of the Event Study Methodology Guide. For formulas and implementation details of individual tests, see AR & CAR Tests and AAR & CAAR Tests.
Decision Framework
Selecting a test statistic involves answering four questions about your study design. Each question narrows the set of appropriate tests.
| Question | If Yes | If No |
|---|---|---|
| 1. Single event or multiple events? | Use AR/CAR tests (single-event) | Use AAR/CAAR tests (multi-event) |
| 2. Can you assume normally distributed returns? | Parametric tests are valid | Use non-parametric tests (Sign, Rank) |
| 3. Does the event change return variance? | Use variance-robust tests (BMP, Kolari-Pynnonen) | Standard tests (Patell Z, Cross-Sectional t) are adequate |
| 4. Are event dates clustered across firms? | Use clustering-robust tests (Kolari-Pynnonen, Calendar-Time) | Cross-sectional independence assumption holds |
Parametric vs. Non-Parametric Tests
The first major distinction is between parametric and non-parametric tests. Each class has strengths and weaknesses.
Parametric Tests
Parametric tests assume that abnormal returns follow a known distribution (typically normal). They use the estimated variance of abnormal returns to construct test statistics. When the distributional assumptions hold, parametric tests are more powerful than non-parametric alternatives.
| Test | Key Assumption | Robust To | Not Robust To |
|---|---|---|---|
| Cross-Sectional t-test | AR ~ Normal, equal variance | Heterogeneous event effects | Event-induced variance; non-normality |
| Patell Z | Standardized AR ~ Normal | Heterogeneous firm-specific variance | Event-induced variance; clustering |
| BMP test | Standardized AR ~ Normal | Event-induced variance changes | Clustering; severe non-normality |
| Kolari-Pynnonen | Standardized AR ~ Normal | Event-induced variance + clustering | Severe non-normality |
Non-Parametric Tests
Non-parametric tests make minimal distributional assumptions. They are particularly useful when returns are skewed, heavy-tailed, or contain outliers — conditions that are common in practice, especially for small-cap stocks and emerging markets.
| Test | How It Works | Strengths | Weaknesses |
|---|---|---|---|
| Sign test | Tests whether the proportion of positive ARs exceeds 50% | Robust to outliers and non-normality | Low power; ignores AR magnitude |
| Generalized Sign test | Adjusts the expected proportion using estimation window data | Accounts for asymmetric return distributions | Still ignores magnitude; requires estimation window data |
| Rank test | Ranks ARs against estimation window residuals | Robust to non-normality; uses magnitude information | Assumes symmetric distribution under H0 |
Use both
Best practice is to report both parametric and non-parametric test results. If they agree, your conclusions are robust. If they disagree, investigate why — it often reveals data quality issues or distributional violations.
Handling Event-Induced Variance
Many events cause return variance to increase on the event date. For example, earnings announcements typically double or triple daily return variance. M&A announcements can increase variance even more. This phenomenon is called event-induced variance.
Standard tests like the Patell Z assume that the variance of abnormal returns in the event window equals the estimation-window variance. When event-induced variance is present, this assumption is violated, and the Patell Z test over-rejects the null hypothesis — it reports significant effects even when there are none.
| Test | Handles Event-Induced Variance? | How |
|---|---|---|
| Cross-Sectional t-test | Partially | Uses cross-sectional variance, which captures some variance increase |
| Patell Z | No | Uses estimation-window variance only |
| BMP test | Yes | Standardizes by estimation-window variance, then uses cross-sectional variance of standardized ARs |
| Kolari-Pynnonen | Yes | Extends BMP with clustering adjustment |
| Sign test | Yes | Does not use variance estimates |
| Rank test | Partially | Rank transformation reduces the effect of variance changes |
The BMP test (Boehmer, Musumeci, and Poulsen, 1991) is the most widely recommended solution. It first standardizes each firm's abnormal return by its estimation-window standard deviation, then computes the cross-sectional standard deviation of these standardized abnormal returns. This two-step approach captures event-induced variance in the cross-sectional step.
# Configure with BMP test
ps <- ParameterSet$new(
multi_event_statistics = MultiEventStatisticsSet$new(
tests = list(
BMPTest$new(), # Robust to event-induced variance
CSectTTest$new(), # Comparison (not robust)
PatellZTest$new(), # Comparison (not robust)
GeneralizedSignTest$new() # Non-parametric robustness check
)
)
)
task <- run_event_study(task, ps)Handling Cross-Correlation (Event Clustering)
When multiple firms experience the event on the same date (or within overlapping event windows), their abnormal returns are cross-sectionally correlated. This violates the independence assumption of most tests and causes over-rejection.
Common scenarios with clustered events include:
- Regulatory changes: A new law affects all firms in an industry on the same date.
- Macroeconomic announcements: Interest rate decisions, GDP releases, or employment reports affect all stocks simultaneously.
- Industry-wide events: A product recall or safety incident that affects all competitors.
| Approach | Test / Method | When to Use |
|---|---|---|
| Clustering-adjusted test | Kolari-Pynnonen | Moderate clustering; want to keep standard event study framework |
| Calendar-Time Portfolio | CalendarTimePortfolioTest | Severe clustering; events concentrated on few dates |
| Crude Dependence Adjustment | Manual adjustment to Patell Z | Quick adjustment; moderate clustering |
| Portfolio approach | Aggregate firms into one portfolio per event date | All firms share the same event date |
# Kolari-Pynnonen: robust to both variance change and clustering
ps <- ParameterSet$new(
multi_event_statistics = MultiEventStatisticsSet$new(
tests = list(
KolariPynnonenTest$new(), # Robust to variance + clustering
CalendarTimePortfolioTest$new(), # Portfolio approach
BMPTest$new(), # Robust to variance only (comparison)
RankTest$new() # Non-parametric check
)
)
)
task <- run_event_study(task, ps)Do not ignore clustering
Ignoring cross-correlation when events are clustered leads to severely inflated test statistics. Kolari and Pynnonen (2010) show that the actual rejection rate of the Patell Z test can exceed 70% at a nominal 5% significance level when events are clustered. Always check whether your event dates overlap.
Recommended Tests by Scenario
The table below provides concrete recommendations for common event study scenarios. Each recommendation includes a primary test and a robustness check.
| Scenario | Primary Test | Robustness Check | Rationale |
|---|---|---|---|
| M&A announcements (target firms) | BMP test | Rank test | Large abnormal returns cause variance increase; target CARs are well-behaved |
| Earnings announcements | BMP test | Generalized Sign test | Strong event-induced variance; large samples available |
| Regulatory changes (industry-wide) | Kolari-Pynnonen | Calendar-Time Portfolio | Same event date for all firms; cross-correlation is severe |
| ESG events (heterogeneous dates) | BMP test | Sign test | Dates vary across firms; event-induced variance moderate |
| Small sample (N < 20) | Cross-Sectional t-test | Sign test | BMP and Patell Z require larger samples for asymptotic properties |
| Non-normal returns (small caps) | Rank test | Generalized Sign test | Parametric tests unreliable with heavy tails and skewness |
| Long-run studies (BHAR) | Skewness-adjusted t-test | Bootstrap | BHAR returns are severely right-skewed; standard t-tests invalid |
| Single-firm case study | AR t-test + CAR t-test | Permutation test | No cross-sectional aggregation; use exact tests if possible |
Decision Tree
Follow this decision tree to select your test statistics. Start at the top and follow the path that matches your study.
| Step | Condition | Action |
|---|---|---|
| 1 | Single firm? | Use AR t-test + CAR t-test. Add Permutation test for robustness. Stop. |
| 2 | Multiple firms. Events on same date? | Go to Step 2a. |
| 2a | Yes, clustered dates. | Use Kolari-Pynnonen + Calendar-Time Portfolio + Rank test. Stop. |
| 2b | No, non-clustered. | Go to Step 3. |
| 3 | Event likely changes variance? | Go to Step 3a. |
| 3a | Yes, event-induced variance. | Use BMP test + Generalized Sign test. Stop. |
| 3b | No, stable variance. | Go to Step 4. |
| 4 | Returns approximately normal? | Go to Step 4a. |
| 4a | Yes, normal returns. | Use Cross-Sectional t-test + Patell Z + Sign test. Stop. |
| 4b | No, non-normal returns. | Use Rank test + Generalized Sign test. Stop. |
How Do I Configure Multiple Tests in R?
The EventStudy package allows you to run multiple tests simultaneously. This makes it easy to compare results and assess robustness.
# Recommended setup: parametric + non-parametric + variance-robust
ps <- ParameterSet$new(
# Single-event tests
single_event_statistics = SingleEventStatisticsSet$new(
tests = list(
ARTTest$new(),
CARTTest$new()
)
),
# Multi-event tests
multi_event_statistics = MultiEventStatisticsSet$new(
tests = list(
# Parametric
CSectTTest$new(), # Baseline
PatellZTest$new(), # Standardized
BMPTest$new(), # Variance-robust
KolariPynnonenTest$new(), # Variance + clustering robust
# Non-parametric
SignTest$new(),
GeneralizedSignTest$new(),
RankTest$new()
)
)
)
task <- run_event_study(task, ps)
# View all test results
task$get_aar()# One-sided tests (e.g., testing for positive abnormal returns only)
ps <- ParameterSet$new(
multi_event_statistics = MultiEventStatisticsSet$new(
tests = list(
CSectTTest$new(confidence_type = "one-sided"),
BMPTest$new(confidence_type = "one-sided")
)
)
)Common Mistakes
- Using only the Patell Z test: The Patell Z is popular because it is well-known, but it is not robust to event-induced variance. Always include the BMP test.
- Ignoring clustering: If your events share event dates (even partially overlapping windows), standard tests will produce spuriously significant results.
- Relying on a single test: No single test is best in all scenarios. Report at least two tests (one parametric, one non-parametric) to demonstrate robustness.
- Using parametric tests with small samples: With fewer than 20 events, asymptotic properties of Patell Z and BMP may not hold. Prefer the Cross-Sectional t-test and non-parametric tests.
- Not checking normality: Run a Shapiro-Wilk test on the estimation-window residuals or inspect a QQ-plot before relying on parametric tests.
Literature
- Boehmer, E., Masumeci, J. & Poulsen, A.B. (1991). Event-study methodology under conditions of event-induced variance. Journal of Financial Economics, 30(2), 253-272.
- Kolari, J.W. & Pynnonen, S. (2010). Event study testing with cross-sectional correlation of abnormal returns. Review of Financial Studies, 23(11), 3996-4025.
- Corrado, C.J. (1989). A nonparametric test for abnormal security-price performance in event studies. Journal of Financial Economics, 23(2), 385-395.
- Patell, J.M. (1976). Corporate forecasts of earnings per share and stock price behavior. Journal of Accounting Research, 14(2), 246-276.
Implement this with the R package
Access advanced features and full customization through the EventStudy R package.
What Should I Do Next?
- AR & CAR Test Statistics — formulas and code for single-event tests
- AAR & CAAR Test Statistics — formulas and code for multi-event tests
- Variance-Based Tests — deep dive into event-induced variance handling
- Inference & Robustness — wild bootstrap and multiple testing corrections