Skip to content

Yuning598/EquityChars

Repository files navigation

Contact

Version

Academic Background

Empirical asset pricing research often needs firm-level equity characteristics and portfolio-level characteristic summaries. This repository provides a Python toolkit for constructing U.S. equity characteristics from WRDS inputs, with the current implementation centered on the CRSP CIZ data format.

Prerequisite

  • Read the listed papers
  • WRDS account with subscription to CRSP, Compustat and IBES.
  • Python (with polars, pandas, numpy, tqdm, wrds, duckdb, pyarrow)

Files

Main Files (chars_ciz/)

  • download_data.py — pull raw CRSP CIZ monthly/daily, Compustat funda/fundq, IBES, FF factors from WRDS (via DuckDB) into data/raw/
  • functions.py — shared helpers: ttm, ttm4, ttm12, chars_std, industry classifications (ffi49), imputation and standardization utilities, INPUT_PATH / OUTPUT_PATH constants
  • accounting.py — builds annual + quarterly accounting characteristics and merges with monthly CRSP (crsp_mom) for monthly-grain chars (me, turn, dolvol, dy, mom*, seas1a, indmom, …). Outputs chars_a_accounting.parquet, chars_q_accounting.parquet
  • rolling_chars.py — rolling daily-window characteristics from CRSP daily: beta, baspread, ill, maxret, rvar_capm, rvar_ff3, rvar_mean, std_dolvol, std_turn, zerotrade. Outputs rolling_chars.parquet
  • sue.py — unexpected quarterly earnings (SUE)
  • abr.py — cumulative abnormal returns around earnings announcement dates
  • myre.py — revisions in analysts' earnings forecasts (uses IBES)
  • merge_chars.py — merges accounting + rolling + satellite characteristics (sue, abr, myre) with CRSP backfill. Outputs chars_a_raw.parquet, chars_q_raw.parquet
  • impute_rank_output.py — reconciles annual/quarterly accounting variables by most-recent datadate, computes ffi49, lags returns one period, and writes the four final outputs (raw / imputed × no-rank / rank)
  • iclink_ciz.sas — IBES ↔ CRSP CIZ link table (run on WRDS SAS Studio); output saved to data/raw/iclink_ciz.csv

Documents

  • documents/chars_summary.csv — current characteristic acronym, description, author, year, category

How to use (CIZ pipeline)

All commands are run from chars_ciz/. Paths are configured in functions.py (INPUT_PATH, OUTPUT_PATH).

  1. python download_data.py — pull raw WRDS tables into data/raw/ (also run iclink_ciz.sas on WRDS to produce iclink_ciz.csv)
  2. python accounting.py — build annual/quarterly accounting + monthly chars
  3. python rolling_chars.py — daily-window rolling chars
  4. python sue.py, python abr.py, python myre.py — satellite chars (can run in parallel; myre.py requires IBES + the iclink table)
  5. python merge_chars.py — merge everything into chars_a_raw.parquet and chars_q_raw.parquet
  6. python impute_rank_output.py — produce the four final outputs

Outputs

Data

The stock universe is the top three U.S. exchanges (NYSE / AMEX / NASDAQ), filtered to common equity via CRSP CIZ flags (sharetype='NS', securitytype='EQTY', securitysubtype='COM', usincflg='Y', issuertype∈{ACOR,CORP}, conditionaltype='RW', tradingstatusflg='A'). The date range follows the available WRDS coverage.

Returns are shifted one period forward so that characteristics at time $t$ predict the return at $t+1$ (i.e. $ret_{t+1}$ is aligned with $chars_t$).

The four final files (all parquet) are:

  1. chars_raw_no_impute.parquet — raw characteristic levels, missing values preserved
  2. chars_raw_imputed.parquet — same as above with industry-median / industry-mean imputation
  3. chars_rank_no_impute.parquet — cross-sectional rank-standardized characteristics (no imputation)
  4. chars_rank_imputed.parquet — cross-sectional rank-standardized characteristics (imputed)

Information Variables

  • stock identifier: gvkey, permno, ticker, conm, comnam
  • time: datadate (accounting), date (return date)
  • industry: sic, ffi49
  • price / size: prc, shrout, me, log_me, lag_me
  • return: ret (delisting-adjusted via CRSP CIZ mthret)

Method

Equity Characteristics

The equity-characteristic definitions follow the empirical anomaly and replication literature, especially Green, Hand, and Zhang and Hou, Xue, and Zhang. The current implementation documents 99 U.S. equity characteristics in documents/chars_summary.csv and provides audited formula notes under documents/formula_docs/.

Portfolio Characteristics

Portfolio characteristics can be constructed as equal-weighted or value-weighted averages of firm-level characteristics for equities in each portfolio.

Common portfolio applications include:

Reference

Papers

Codes

All comments are welcome.

Releases

No releases published

Packages

 
 
 

Contributors