Estimation code and annual dataset for the Probability of Informed Trading (PIN) family of models.
- GPIN (Generalized PIN): Duarte, Hu, and Young (2020). Negative binomial mixture model of buyer- and seller-initiated trades.
- OWR (Odders-White and Ready): Odders-White and Ready (2008). Multivariate normal mixture model of order imbalance and return residuals.
Annual estimates are released on Zenodo.
- Years: 2003-2024
- Stocks: All common stocks (shrcd 10/11) on NYSE, AMEX, NASDAQ (exchcd 1-4)
- Source: WRDS TAQ Intraday Indicators, CRSP, and Compustat
| Column | Description |
|---|---|
permno |
CRSP permanent security identifier |
yyyy |
Estimation year |
a |
Probability of an information event |
p |
Negative binomial probability parameter |
eta |
Information intensity (additional order flow on event days) |
r |
Negative binomial shape parameter |
d |
Probability of good news (conditional on event) |
th |
Baseline buy proportion (no-event days) |
f |
Maximized log-likelihood |
rc |
Optimizer return code (0 = success) |
| Column | Description |
|---|---|
permno |
CRSP permanent security identifier |
yyyy |
Estimation year |
a |
Probability of an information event |
su |
Volatility of uninformed order imbalance |
sz |
Volatility of noise in order imbalance |
si |
Volatility of private information signal |
spd |
Volatility of day return residual |
spo |
Volatility of overnight return residual |
f |
Maximized log-likelihood |
rc |
Optimizer return code (0 = success) |
Each stock-year is estimated with 10 random starting values. The optimizer uses L-BFGS-B with parameter bounds.
- Data access: WRDS subscription (TAQ + CRSP)
- Python: numpy, scipy, pandas, statsmodels (managed via pixi)
- SAS: For the data pipeline (runs on WRDS grid)
- HPC (optional): Slurm cluster for parallel estimation
1. pipeline/build_taq.sas — Extract TAQ intraday indicators, merge with CRSP
2. pipeline/build_residuals.sas — Stack years, compute CAPM betas and cross-sectional residuals
3. pipeline/convert_hdf5.py — Convert SAS dataset to HDF5 for Python
4. est.py — Estimate GPIN and OWR models (parallelized)
# Build data on WRDS
qsub -t 2003-2024 scripts/run_taq.sh
qsub scripts/run_residuals.sh
# Convert and transfer
python pipeline/convert_hdf5.py
scp taqdfx_all6.h5 hpc:/scratch/user/pin-code/
# Estimate on Slurm cluster
sbatch scripts/run_est_slurm.sh owr
sbatch scripts/run_est_slurm.sh gpin| Model | Per stock-year | Full run (22 years, HPC) |
|---|---|---|
| GPIN | ~1 sec | < 10 min |
| OWR | ~5 sec | < 30 min |
Benchmarked on UVA HPC (standard partition, 8 cpus-per-task, 176 array tasks).
- Duarte, J., Hu, E., and Young, L. (2020). "A Comparison of Some Structural Models of Private Information Arrival." Journal of Financial Economics, 136(3), 723-760.
- Odders-White, E. R. and Ready, M. J. (2008). "The Probability and Magnitude of Information Events." Journal of Financial Economics, 87(1), 227-248.