- Jianxin Ma
- jianxin.ma@warwick.ac.uk
- All in Python
- Two pipeline variants live in this repository:
chars_ciz/— current CIZ (CRSP CIZ / v2) pipeline. Use this.chars_siz/— legacy SIZ (CRSP SIZ / v1) pipeline. Kept for reference only.
- The SAS version is here EquityCharacteristicsSAS
- Extension to China A Share Market
- Extension to Factors and Portfolios in China Market
Empirical asset pricing research often needs firm-level equity characteristics and portfolio-level characteristic summaries. This repository provides a Python toolkit for constructing U.S. equity characteristics from WRDS inputs, with the current implementation centered on the CRSP CIZ data format.
- Read the listed papers
- WRDS account with subscription to CRSP, Compustat and IBES.
- Python (with
polars,pandas,numpy,tqdm,wrds,duckdb,pyarrow)
download_data.py— pull raw CRSP CIZ monthly/daily, Compustat funda/fundq, IBES, FF factors from WRDS (via DuckDB) intodata/raw/functions.py— shared helpers:ttm,ttm4,ttm12,chars_std, industry classifications (ffi49), imputation and standardization utilities,INPUT_PATH/OUTPUT_PATHconstantsaccounting.py— builds annual + quarterly accounting characteristics and merges with monthly CRSP (crsp_mom) for monthly-grain chars (me,turn,dolvol,dy,mom*,seas1a,indmom, …). Outputschars_a_accounting.parquet,chars_q_accounting.parquetrolling_chars.py— rolling daily-window characteristics from CRSP daily:beta,baspread,ill,maxret,rvar_capm,rvar_ff3,rvar_mean,std_dolvol,std_turn,zerotrade. Outputsrolling_chars.parquetsue.py— unexpected quarterly earnings (SUE)abr.py— cumulative abnormal returns around earnings announcement datesmyre.py— revisions in analysts' earnings forecasts (uses IBES)merge_chars.py— merges accounting + rolling + satellite characteristics (sue,abr,myre) with CRSP backfill. Outputschars_a_raw.parquet,chars_q_raw.parquetimpute_rank_output.py— reconciles annual/quarterly accounting variables by most-recentdatadate, computesffi49, lags returns one period, and writes the four final outputs (raw / imputed × no-rank / rank)iclink_ciz.sas— IBES ↔ CRSP CIZ link table (run on WRDS SAS Studio); output saved todata/raw/iclink_ciz.csv
documents/chars_summary.csv— current characteristic acronym, description, author, year, category
All commands are run from chars_ciz/. Paths are configured in functions.py (INPUT_PATH, OUTPUT_PATH).
python download_data.py— pull raw WRDS tables intodata/raw/(also runiclink_ciz.sason WRDS to produceiclink_ciz.csv)python accounting.py— build annual/quarterly accounting + monthly charspython rolling_chars.py— daily-window rolling charspython sue.py,python abr.py,python myre.py— satellite chars (can run in parallel;myre.pyrequires IBES + the iclink table)python merge_chars.py— merge everything intochars_a_raw.parquetandchars_q_raw.parquetpython impute_rank_output.py— produce the four final outputs
The stock universe is the top three U.S. exchanges (NYSE / AMEX / NASDAQ), filtered to common equity via CRSP CIZ flags (sharetype='NS', securitytype='EQTY', securitysubtype='COM', usincflg='Y', issuertype∈{ACOR,CORP}, conditionaltype='RW', tradingstatusflg='A'). The date range follows the available WRDS coverage.
Returns are shifted one period forward so that characteristics at time
The four final files (all parquet) are:
chars_raw_no_impute.parquet— raw characteristic levels, missing values preservedchars_raw_imputed.parquet— same as above with industry-median / industry-mean imputationchars_rank_no_impute.parquet— cross-sectional rank-standardized characteristics (no imputation)chars_rank_imputed.parquet— cross-sectional rank-standardized characteristics (imputed)
- stock identifier:
gvkey,permno,ticker,conm,comnam - time:
datadate(accounting),date(return date) - industry:
sic,ffi49 - price / size:
prc,shrout,me,log_me,lag_me - return:
ret(delisting-adjusted via CRSP CIZmthret)
The equity-characteristic definitions follow the empirical anomaly and
replication literature, especially Green, Hand, and Zhang and Hou, Xue, and
Zhang. The current implementation documents 99 U.S. equity characteristics in
documents/chars_summary.csv and provides audited formula notes under
documents/formula_docs/.
Portfolio characteristics can be constructed as equal-weighted or value-weighted averages of firm-level characteristics for equities in each portfolio.
Common portfolio applications include:
- Characteristics-sorted Portfolio, see the listed papers and also Deep Learning in Characteristics-Sorted Factor Models
- DGTW Benchmark, see DGTW 1997 JF
- Industry portfolio
-
Dissecting Anomalies with a Five-Factor Model by Fama and French 2015 RFS
- Define the characteristics of a portfolio as the value-weight averages (market-cap weights) of the variables for the firms in the portfolio
- French's Data Library
-
The Characteristics that Provide Independent Information about Average U.S. Monthly Stock Returns by Green Hand Zhang 2017 RFS
-
Replicating Anomalies by Hou Xue Zhang 2018 RFS
- Legacy SAS calculations mainly refer to SAS code by Green Hand Zhang.
- Portfolio characteristics mainly refer to WRDS Financial Ratios Suite and Variable Definition
All comments are welcome.