Skip to content

edwinhu/pin-code

Repository files navigation

PIN Code

Estimation code and annual dataset for the Probability of Informed Trading (PIN) family of models.

DOI

Models

  • GPIN (Generalized PIN): Duarte, Hu, and Young (2020). Negative binomial mixture model of buyer- and seller-initiated trades.
  • OWR (Odders-White and Ready): Odders-White and Ready (2008). Multivariate normal mixture model of order imbalance and return residuals.

Data

Annual estimates are released on Zenodo.

Coverage

  • Years: 2003-2024
  • Stocks: All common stocks (shrcd 10/11) on NYSE, AMEX, NASDAQ (exchcd 1-4)
  • Source: WRDS TAQ Intraday Indicators, CRSP, and Compustat

GPIN (gpin_all.csv)

Column Description
permno CRSP permanent security identifier
yyyy Estimation year
a Probability of an information event
p Negative binomial probability parameter
eta Information intensity (additional order flow on event days)
r Negative binomial shape parameter
d Probability of good news (conditional on event)
th Baseline buy proportion (no-event days)
f Maximized log-likelihood
rc Optimizer return code (0 = success)

OWR (owr_all.csv)

Column Description
permno CRSP permanent security identifier
yyyy Estimation year
a Probability of an information event
su Volatility of uninformed order imbalance
sz Volatility of noise in order imbalance
si Volatility of private information signal
spd Volatility of day return residual
spo Volatility of overnight return residual
f Maximized log-likelihood
rc Optimizer return code (0 = success)

Estimation

Each stock-year is estimated with 10 random starting values. The optimizer uses L-BFGS-B with parameter bounds.

Requirements

  • Data access: WRDS subscription (TAQ + CRSP)
  • Python: numpy, scipy, pandas, statsmodels (managed via pixi)
  • SAS: For the data pipeline (runs on WRDS grid)
  • HPC (optional): Slurm cluster for parallel estimation

Pipeline

1. pipeline/build_taq.sas       — Extract TAQ intraday indicators, merge with CRSP
2. pipeline/build_residuals.sas — Stack years, compute CAPM betas and cross-sectional residuals
3. pipeline/convert_hdf5.py     — Convert SAS dataset to HDF5 for Python
4. est.py                       — Estimate GPIN and OWR models (parallelized)

Quick Start (HPC)

# Build data on WRDS
qsub -t 2003-2024 scripts/run_taq.sh
qsub scripts/run_residuals.sh

# Convert and transfer
python pipeline/convert_hdf5.py
scp taqdfx_all6.h5 hpc:/scratch/user/pin-code/

# Estimate on Slurm cluster
sbatch scripts/run_est_slurm.sh owr
sbatch scripts/run_est_slurm.sh gpin

Performance

Model Per stock-year Full run (22 years, HPC)
GPIN ~1 sec < 10 min
OWR ~5 sec < 30 min

Benchmarked on UVA HPC (standard partition, 8 cpus-per-task, 176 array tasks).

References

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors