Data Preparation for Event Studies

Correct data preparation is the foundation of a reliable event study. The EventStudy R package requires three input datasets: firm stock prices, a market index, and an event request table. Each must follow a specific format. This page covers the required data structures, date formatting, how to handle missing observations, and quality checks you should perform before running an analysis.

Part of the Methodology Guide

This page is part of the Event Study Methodology Guide.

What Data Do I Need?

Every event study requires three datasets. The firm stock prices and market index provide the return data used to estimate the expected return model and compute abnormal returns. The event request table defines which events to analyze and how.

DatasetPurposeRequired ColumnsRows
Firm stock pricesDaily adjusted closing prices for each firmsymbol, date, adjustedOne row per firm per trading day
Market indexBenchmark return data (e.g., S&P 500)symbol, date, adjustedOne row per trading day
Event request tableDefines events and window parametersevent_id, firm_symbol, index_symbol, event_date, group, window paramsOne row per event

Use adjusted closing prices

Always use adjusted closing prices that account for stock splits, dividends, and other corporate actions. Unadjusted prices create artificial return spikes on ex-dividend and split dates, which contaminate both the estimation and event windows.

Firm Stock Price Data

The firm stock price data frame contains daily adjusted closing prices for each firm in your study. It must have exactly three columns:

ColumnTypeDescriptionExample
symbolCharacterUnique firm identifier (ticker or ISIN)AAPL
dateCharacterTrading date in DD.MM.YYYY format03.01.2020
adjustedNumericAdjusted closing price102.50

The data must span the entire period needed for each event: the estimation window plus the event window. For example, if you use a 250-day estimation window with a 10-day gap and a [-5, +5] event window, you need at least 270 trading days of data before and 5 days after each event date.

Prepare firm stock price data
library(EventStudy)
library(tidyverse)

# Load price data (e.g., from Yahoo Finance via tidyquant)
# library(tidyquant)
# raw_prices <- tq_get(c("AAPL", "MSFT", "GOOGL"),
#                       from = "2018-01-01", to = "2023-12-31")

# Format for the EventStudy package
firm_data <- raw_prices |>
  transmute(
    symbol = symbol,
    date = format(date, "%d.%m.%Y"),
    adjusted = adjusted
  )

head(firm_data)
# # A tibble: 6 x 3
#   symbol date       adjusted
#   <chr>  <chr>         <dbl>
# 1 AAPL   02.01.2018     172.
# 2 AAPL   03.01.2018     172.
# 3 AAPL   04.01.2018     173.
# 4 AAPL   05.01.2018     175.
# 5 AAPL   08.01.2018     174.
# 6 AAPL   09.01.2018     174.

Data sources

Common sources for adjusted stock prices include Yahoo Finance (via the tidyquant or quantmod R packages), Bloomberg, Refinitiv (formerly Thomson Reuters), and CRSP for academic research.

Market Index Data

The market index data frame has the same three-column structure as the firm data. It provides the benchmark return for the expected return model (e.g., the Market Model). Choose an index that represents the market in which your firms trade.

MarketCommon IndexSymbol Example
US equitiesS&P 500, CRSP Value-WeightedSP500
European equitiesSTOXX Europe 600, DAXSTOXX600
UK equitiesFTSE 100FTSE100
Japanese equitiesNikkei 225, TOPIXTOPIX
Emerging marketsMSCI Emerging MarketsMSCIEM
Prepare market index data
# Market index data (same structure as firm data)
index_data <- sp500_prices |>
  transmute(
    symbol = "SP500",
    date = format(date, "%d.%m.%Y"),
    adjusted = adjusted
  )

# Ensure the index covers the same date range as firm data
stopifnot(min(index_data$date) <= min(firm_data$date))
stopifnot(max(index_data$date) >= max(firm_data$date))

Index must match firm dates

The market index must cover at least the same date range as the firm data. If the index has trading days that a firm does not (e.g., the firm was not yet listed), the package handles this automatically. But if the index data is shorter than the required estimation + event window, the event will fail.

Event Request Table

The event request table defines each event you want to analyze. Each row represents one event: a specific firm experiencing a specific event on a specific date.

ColumnTypeDescriptionExample
event_idIntegerUnique identifier for the event1
firm_symbolCharacterMust match a symbol in firm dataAAPL
index_symbolCharacterMust match a symbol in index dataSP500
event_dateCharacterDate of the event in DD.MM.YYYY15.03.2022
groupCharacterGrouping variable for cross-sectional analysisTech
event_window_startIntegerStart of event window relative to event date-5
event_window_endIntegerEnd of event window relative to event date5
shift_estimation_windowIntegerGap between estimation and event windows-10
estimation_window_lengthIntegerNumber of trading days in estimation window250
Create event request table
# Define events
request <- tibble(
  event_id = 1:3,
  firm_symbol = c("AAPL", "MSFT", "GOOGL"),
  index_symbol = "SP500",
  event_date = c("15.03.2022", "20.04.2022", "10.05.2022"),
  group = c("Tech", "Tech", "Tech"),
  event_window_start = -5,
  event_window_end = 5,
  shift_estimation_window = -10,
  estimation_window_length = 250
)

# The estimation window will be:
# [event_date + shift - estimation_window_length,
#  event_date + shift]
# i.e., 250 days ending 10 days before the event

The shift_estimation_window parameter creates a gap between the end of the estimation window and the start of the event window. This gap prevents the event from contaminating the estimation of normal returns. A typical gap is 10 trading days. For more on window selection, see Event Window & Estimation Window Selection.

Date Formatting

The EventStudy package uses the DD.MM.YYYY format for all dates. This applies to both the price data and the event request table. Dates must represent trading days — weekends and public holidays are not valid event dates.

Date formatting examples
# Convert from Date object to DD.MM.YYYY string
date_obj <- as.Date("2022-03-15")
formatted <- format(date_obj, "%d.%m.%Y")
# "15.03.2022"

# Convert from common formats
# From YYYY-MM-DD (ISO)
as.Date("2022-03-15") |> format("%d.%m.%Y")

# From MM/DD/YYYY (US)
as.Date("03/15/2022", format = "%m/%d/%Y") |> format("%d.%m.%Y")

# From YYYYMMDD (compact)
as.Date("20220315", format = "%Y%m%d") |> format("%d.%m.%Y")

# Batch convert a data frame column
df$date <- format(as.Date(df$date), "%d.%m.%Y")

Common date mistakes

The most frequent data preparation error is incorrect date formatting. Ensure you use periods as separators (not hyphens or slashes) and the day-month-year order (not month-day-year). A date like "03.01.2020" means January 3, 2020 — not March 1.

Handling Missing Data

Missing data arises from trading halts, newly listed firms, delistings, and data provider gaps. How you handle missing observations depends on where they appear.

LocationImpactRecommended Action
Estimation window (few days)Slightly fewer observations for model fittingUsually acceptable; the package handles this automatically
Estimation window (>20% missing)Unreliable model parameter estimatesExclude the event from the study
Event windowGap in abnormal returns; cumulative AR is unreliableExclude the event or use the last available price (carry forward)
Market indexCannot compute market return for that dayFill from alternative source or exclude affected days
Check and handle missing data
# Check for missing values
sum(is.na(firm_data$adjusted))

# Check for gaps in trading dates per firm
firm_data |>
  group_by(symbol) |>
  summarise(
    n_days = n(),
    first_date = min(date),
    last_date = max(date),
    n_missing = sum(is.na(adjusted))
  )

# Remove firms with insufficient data
min_required_days <- 250 + 10 + 5  # estimation + gap + event window
firm_data <- firm_data |>
  group_by(symbol) |>
  filter(n() >= min_required_days) |>
  ungroup()

# Optional: forward-fill isolated missing prices
# (use with caution -- this assumes zero return on missing days)
firm_data <- firm_data |>
  group_by(symbol) |>
  arrange(date) |>
  fill(adjusted, .direction = "down") |>
  ungroup()

Data Quality Checks

Before running an event study, perform the following quality checks to catch common data issues that can invalidate results.

CheckWhat to Look ForHow to Fix
Duplicate rowsSame symbol + date appears more than oncedistinct(symbol, date, .keep_all = TRUE)
Zero or negative pricesAdjusted price <= 0Remove or replace from alternative source
Extreme returnsDaily return > 100% or < -90%Verify against source; likely a data error or split not adjusted
Date alignmentFirm and index dates do not overlapEnsure both cover the same trading calendar
Sufficient historyFewer days than estimation_window_length + gap + event_windowExtend data range or shorten estimation window
Event date is a trading dayEvent date falls on weekend or holidayShift to next trading day or verify against trading calendar
Complete data quality check
# 1. Remove duplicates
firm_data <- firm_data |> distinct(symbol, date, .keep_all = TRUE)
index_data <- index_data |> distinct(symbol, date, .keep_all = TRUE)

# 2. Check for zero or negative prices
stopifnot(all(firm_data$adjusted > 0, na.rm = TRUE))
stopifnot(all(index_data$adjusted > 0, na.rm = TRUE))

# 3. Check for extreme returns (flag but don't auto-remove)
firm_returns <- firm_data |>
  group_by(symbol) |>
  arrange(date) |>
  mutate(ret = adjusted / lag(adjusted) - 1) |>
  ungroup()

extreme <- firm_returns |> filter(abs(ret) > 1)
if (nrow(extreme) > 0) {
  warning(paste(nrow(extreme), "extreme returns detected. Verify data."))
  print(extreme)
}

# 4. Check that event dates exist in firm data
event_dates_valid <- request |>
  rowwise() |>
  mutate(
    date_in_firm = event_date %in%
      (firm_data |> filter(symbol == firm_symbol) |> pull(date)),
    date_in_index = event_date %in%
      (index_data |> filter(symbol == index_symbol) |> pull(date))
  )

invalid_events <- event_dates_valid |>
  filter(!date_in_firm | !date_in_index)

if (nrow(invalid_events) > 0) {
  warning("Some event dates are not trading days. Adjust or remove:")
  print(invalid_events)
}

Automate quality checks

Wrap these checks in a function and run them every time you prepare data. This prevents subtle data errors from silently producing unreliable results.

Complete Example: From Raw Data to Event Study

Here is a full workflow that downloads data, prepares it, runs quality checks, and executes an event study.

End-to-end data preparation and event study
library(EventStudy)
library(tidyverse)

# --- Step 1: Prepare firm prices ---
firm_data <- tibble(
  symbol = rep(c("AAPL", "MSFT"), each = 300),
  date = rep(format(
    seq(as.Date("2021-01-04"), by = "day", length.out = 430) |>
      ((d) d[!weekdays(d) %in% c("Saturday", "Sunday")])() |>
      head(300),
    "%d.%m.%Y"
  ), 2),
  adjusted = c(
    cumprod(1 + rnorm(300, 0.0005, 0.015)) * 130,
    cumprod(1 + rnorm(300, 0.0004, 0.012)) * 240
  )
)

# --- Step 2: Prepare market index ---
index_data <- tibble(
  symbol = "SP500",
  date = unique(firm_data$date),
  adjusted = cumprod(1 + rnorm(300, 0.0003, 0.010)) * 4200
)

# --- Step 3: Create event request ---
request <- tibble(
  event_id = 1:2,
  firm_symbol = c("AAPL", "MSFT"),
  index_symbol = "SP500",
  event_date = firm_data |>
    filter(symbol == "AAPL") |>
    slice(260) |>
    pull(date),
  group = "Tech",
  event_window_start = -5,
  event_window_end = 5,
  shift_estimation_window = -10,
  estimation_window_length = 240
)

# --- Step 4: Quality checks ---
stopifnot(all(firm_data$adjusted > 0))
stopifnot(nrow(distinct(firm_data, symbol, date)) == nrow(firm_data))

# --- Step 5: Run event study ---
task <- EventStudyTask$new(firm_data, index_data, request)
ps <- ParameterSet$new()
task <- run_event_study(task, ps)

# --- Step 6: View results ---
task$get_aar()

Run this in R

The EventStudy R package lets you run these calculations programmatically with full control over parameters.

What Should I Do Next?

We use cookies for analytics to improve this site. See our Privacy Policy.