This dossier collects reviewer-facing evidence for StatsPAI's readiness as research software. It is intentionally factual and reproducible.
- Repository: https://github.com/brycewang-stanford/StatsPAI
- Package archive: https://doi.org/10.5281/zenodo.19933900
- PyPI: https://pypi.org/project/StatsPAI/
- License: MIT, with a plain-text
LICENSEfile in the repository. - Current release at the time of this dossier:
1.16.0, released on 2026-05-29. - Public GitHub repository creation date: 2025-07-26.
- Public repository activity signals as of 2026-06-01: 212 stars, 39 forks, 23 GitHub releases, and 1 public external user issue in addition to maintainer-created issue/PR activity.
StatsPAI exposes a unified Python interface for causal inference and applied
econometrics. As of release 1.16.0, the registry reports 1,020 public
functions across 81 submodules:
python scripts/registry_stats.py --checkThe registry and schema layer are part of the public surface. They support
programmatic discovery through sp.list_functions(), sp.describe_function(),
and sp.function_schema().
The repository includes several independent validation tracks:
- Unit and integration tests across the main estimator families.
- R parity modules under
tests/r_parity/. - Stata parity modules under
tests/stata_parity/. - Reference-parity checks under
tests/reference_parity/. - Original-paper replay fixtures under
tests/orig_parity/. - Monte Carlo coverage checks under
tests/coverage_monte_carlo/. - Snapshot tests for publication-table output under
tests/output_snapshots/. - Citation and bibliography audits under
tools/. - Reviewer-facing offline examples under
examples/.
The archived local full-suite report records:
5200 passed, 98 skipped, 13 deselected, 1 xfailed, 2 xpassed
on Python 3.9.6 for the default local suite as of 2026-05-17. The exact report
is stored in test_results_full_suite.md.
StatsPAI includes validation fixtures for common teaching and replication benchmarks, including:
- Card-style returns-to-schooling IV estimates.
- LaLonde / Dehejia-Wahba job-training benchmarks.
- Lee-style close-election regression discontinuity.
- Callaway-Sant'Anna difference-in-differences examples.
- California Proposition 99 synthetic-control examples.
Known convention differences are documented in parity reports rather than hidden. For example, bandwidth selectors, regularisation constants, small-sample standard-error conventions, and fold-split randomness are recorded in the R-parity report where they affect exact numerical matching.
sp.dml is StatsPAI's port of the Double/Debiased Machine Learning framework
(Chernozhukov et al., 2018). Because the canonical reference implementations
are the DoubleML packages for R and Python (Bach, Chernozhukov, Kurz,
Spindler & Klaassen), sp.dml is pinned against both so the numerical
claim is auditable from either ecosystem.
The fixture is a fixed seed-42 DGP (n=1000, p=10, true effect θ=0.5) at
tests/reference_parity/_fixtures/dml_data.csv. All three engines consume the
same CSV. On the Python side, sp.dml and doubleml-for-py are given
identical scikit-learn nuisance learners (LassoCV(cv=5) for regression,
LogisticRegressionCV(cv=5) for the binary propensity) and the same fold
partition under a fixed seed, so any divergence reflects a genuine
implementation difference rather than learner choice or split noise.
All four DoubleML model classes are pinned against doubleml-for-py.
The non-instrumented models (PLR, IRM) use dml_data.csv; the
instrumented models (PLIV, IIVM) use the companion dml_iv_data.csv
(n=2000, continuous instrument z_c, binary instrument z_b).
| Model | sp.dml (StatsPAI 1.16.1) |
doubleml-for-py 0.11.3 |
DoubleML R 1.0.2 (cv.glmnet) |
|---|---|---|---|
| PLR (continuous D) | +0.559022 ± 0.033103 | +0.559022 ± 0.033103 | +0.536759 ± 0.033498 |
| IRM (binary D, AIPW ATE) | −0.019107 ± 0.076561 | −0.026658 ± 0.074206 | +0.006640 ± 0.074434 |
| PLIV (continuous D, instrument) | +0.511701 ± 0.019453 | +0.511701 ± 0.019453 | — (Python-side pin) |
| IIVM (binary D, instrument, LATE) | +0.549467 ± 0.092426 | +0.561773 ± 0.091915 | — (Python-side pin) |
- PLR matches
doubleml-for-pyto machine precision. Under shared learners and folds the point estimate and standard error agree to within one float64 unit in the last place: |Δ coefficient| = 1.1 × 10⁻¹⁶ and |Δ standard error| = 1.4 × 10⁻¹⁷. This is exact numerical equivalence, not a loose tolerance — both implementations evaluate the same Neyman-orthogonal score on the same cross-fit partition. The corresponding deviation from the R reference is ~4.1% on the coefficient, attributable tocv.glmnet's penalty path differing fractionally from scikit-learn'sLassoCV; the R fixture is pinned to within 7% relative. - IRM agrees within one-tenth of a standard error.
sp.dmlanddoubleml-for-pydiffer by 0.0076 on the ATE (≈ 0.10 SE on this fixture); all three implementations are statistically indistinguishable from zero, the truth for this DGP. The residual difference comes from internal AIPW score-construction details — it is verified not to be driven by propensity trimming (matching the clip thresholds leaves it unchanged) nor by IPW normalization (togglingnormalize_ipwleaves it unchanged). The external parity test pins this at < 0.05 absolute.
Both directions are exercised by committed tests, not just asserted in prose:
python -m pip install -e ".[dev,parity]" # the parity extra adds doubleml-for-py
python -m pytest tests/external_parity/test_dml_python_parity.py -v # sp.dml vs doubleml-for-py (machine precision)
python -m pytest tests/reference_parity/test_dml_parity.py -v # sp.dml vs DoubleML R (needs local R + DoubleML)The Python-side check runs whenever doubleml-for-py is installed (via the
parity extra) and skips cleanly otherwise; the R-side check additionally
requires a local R installation with DoubleML 1.0.2 + mlr3learners 0.14.0.
Environment of record for the numbers above: StatsPAI 1.16.1, doubleml-for-py
0.11.3, scikit-learn 1.7.2, and DoubleML R 1.0.2 on R 4.5.2 with cv_glmnet.
The API mapping between sp.dml(model=...) and the DoubleML classes, plus the
full divergence discussion, is in docs/guides/sp_dml_vs_doubleml.md.
At submission time, StatsPAI is being used in working-paper workflows connected to the Stanford Rural Education Action Program and related empirical policy evaluation work. No peer-reviewed research article using StatsPAI has yet been published. The current impact claim is therefore based on credible near-term research use, reproducible validation materials, public package distribution, and reviewer-verifiable examples rather than published downstream citations.
StatsPAI is publicly distributed on PyPI and archived on Zenodo. The GitHub repository has public stars, forks, issue templates, a dedicated support discussion channel, contribution instructions, support instructions, release notes, and CI status checks. These are treated as community-readiness and public-interest signals, not as evidence of independent scholarly adoption.
The public fork list is available through GitHub at
https://github.com/brycewang-stanford/StatsPAI/forks. As of 2026-05-29, the
GitHub API reported 37 forks, all owned by normal GitHub User accounts. The
project does not infer downstream research use from those forks unless a user
opens an issue, pull request, citation, or reproducible report that documents
such use.
StatsPAI Inc. is the legal entity associated with the project. CoPaper.AI is a commercial downstream product that may call the MIT-licensed StatsPAI package. The StatsPAI package itself is permanently open source under the MIT license. This is an open-core / commercial-downstream arrangement: the research software submitted to JOSS remains open, while commercial products can build on it under the same license terms available to all users.
From a repository checkout:
python -m pip install -e ".[dev,plotting]"
python -m pytest tests/test_ols.py tests/test_did.py tests/test_registry.py -q --no-cov
python scripts/registry_stats.py --check
python scripts/schema_quality.py
python tools/audit_bib_duplicates.py --strict
python tools/audit_bib_coverage.py --strict-dangling --hide-orphans
python -m build
python -m twine check dist/*For a shorter package-level check, use the reviewer guide in
docs/joss_reviewer_guide.md.