Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ StatsPAI's focus is **causal inference**. The grid below summarizes method-famil

**Legend**: B = broad API coverage within this comparison table; Y = implemented entry points; P = partial, scattered, or single-algorithm support; N = no first-class entry point. These are API-breadth labels, not validation tiers.

**StatsPAI at a glance**: 1,020 registered functions in the live agent registry · 81 submodules · 269k LOC (core) + 97k LOC (tests). All four numbers are reproducible from the canonical generator (`python scripts/registry_stats.py`); the per-module table in [`docs/stats.md`](docs/stats.md) is regenerated from the same script. For the API-breadth matrix (23 method families) and cross-ecosystem line-count comparison, see [`docs/stats.md`](docs/stats.md).
**StatsPAI at a glance**: 1,022 registered functions in the live agent registry · 81 submodules · 269k LOC (core) + 97k LOC (tests). All four numbers are reproducible from the canonical generator (`python scripts/registry_stats.py`); the per-module table in [`docs/stats.md`](docs/stats.md) is regenerated from the same script. For the API-breadth matrix (23 method families) and cross-ecosystem line-count comparison, see [`docs/stats.md`](docs/stats.md).

**Validation tiers matter**: `stability="stable"` means the public API is SemVer-stable; it does not by itself mean R/Stata/paper parity. Use `sp.list_functions(validation_status="certified")` for cross-language or published-reference evidence, and inspect `sp.describe_function(name)["limitations"]` before production use. See [`docs/guides/stability.md`](docs/guides/stability.md).

Expand Down
2 changes: 1 addition & 1 deletion README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ StatsPAI 聚焦**因果推断**。下表描述方法家族层面的 API 覆盖

**图例**:B = 本表范围内 API 覆盖较宽;Y = 有已实现入口;P = 部分、分散或单算法支持;N = 无一等入口。这些只是 API 广度标签,不是 validation tier。

**StatsPAI 一句话概览**:live agent registry 中有 1,020 个注册函数 · 81 个子模块 · 269k 行核心代码 + 97k 行测试。这四个数字都可以由唯一的生成器 (`python scripts/registry_stats.py`) 现场复算;[`docs/stats.md`](docs/stats.md) 中的按模块拆分表也由同一个脚本回写。23 个方法家族的 API 广度矩阵以及跨生态行数对比,详见 [`docs/stats.md`](docs/stats.md)。这些覆盖数字描述 API 广度,不等同于每个函数都有 R/Stata 数值验证;生产使用请查看 `validation_status`。
**StatsPAI 一句话概览**:live agent registry 中有 1,022 个注册函数 · 81 个子模块 · 269k 行核心代码 + 97k 行测试。这四个数字都可以由唯一的生成器 (`python scripts/registry_stats.py`) 现场复算;[`docs/stats.md`](docs/stats.md) 中的按模块拆分表也由同一个脚本回写。23 个方法家族的 API 广度矩阵以及跨生态行数对比,详见 [`docs/stats.md`](docs/stats.md)。这些覆盖数字描述 API 广度,不等同于每个函数都有 R/Stata 数值验证;生产使用请查看 `validation_status`。

**📦 v1.16.0(2026-05-29)— 正确性修复与跨语言对齐扩展**

Expand Down
95 changes: 95 additions & 0 deletions docs/guides/meta_analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Meta-analysis (evidence synthesis)

> **The output half of a systematic review.** Once you have extracted an
> effect size and its standard error from each study, `sp.meta_analysis`
> pools them — fixed-effect and random-effects — and reports the
> heterogeneity statistics, a prediction interval, Egger's test for
> small-study effects, and forest / funnel plots
> [@dersimonian1986meta; @higgins2002quantifying; @egger1997bias].

This is **summary-data** meta-analysis: you pass per-study effects and
SEs (log odds ratios, mean differences, log hazard ratios, …). It does
not fit individual-participant-data models.

---

## 1. One call

```python
import statspai as sp

# five studies' log odds ratios and their standard errors
effects = [0.10, 0.25, -0.05, 0.30, 0.15]
ses = [0.05, 0.10, 0.08, 0.12, 0.06]

m = sp.meta_analysis(effects, ses, labels=["Trial A", "B", "C", "D", "E"])
print(m.summary())
```

The summary reports **both** models — you do not have to choose blind:

- **Fixed-effect (inverse-variance)** — assumes one common true effect.
- **Random-effects (DerSimonian-Laird)** — assumes the true effect varies
across studies; wider CI, and the model StatsPAI reports by default
(`method="DL"`).

Switch the headline model with `method="fixed"`.

---

## 2. Heterogeneity — is pooling even sensible?

```python
m.q, m.q_pvalue # Cochran's Q and its chi-square p-value
m.i2 # I^2 (fraction of variation due to heterogeneity)
m.tau2 # between-study variance estimate
m.prediction_interval # where a *future* study's true effect is expected
```

A high `I^2` (say > 50%) or a small Q p-value warns that a single pooled
number hides real between-study differences — report the random-effects
estimate **and** the prediction interval, and consider a subgroup or
meta-regression analysis. The prediction interval is typically much wider
than the confidence interval of the pooled mean, which is the honest way
to convey heterogeneity.

---

## 3. Publication bias / small-study effects

```python
egg = m.egger_test() # {'intercept', 'se', 't', 'p_value', 'df'}
print(egg["p_value"])

m.funnel_plot() # visual asymmetry check
```

Egger's test regresses the standard normal deviate on precision; a
non-zero intercept flags funnel-plot asymmetry (often small-study or
publication bias). Treat it as a screen, not proof — asymmetry has many
possible causes.

---

## 4. The forest plot

```python
m.forest_plot() # per-study CIs + pooled diamond
```

---

## Notes & limitations

- DerSimonian-Laird is the classic random-effects estimator; for very few
studies or rare events, REML / Paule-Mandel `tau^2` and Hartung-Knapp
CIs are more conservative and are on the roadmap.
- Effect sizes must be supplied on an additive scale (log-transform ratio
measures before pooling, then exponentiate the pooled estimate for
reporting).

## Where to next

- [Power & sample size for epidemiological designs](power_epi.md)
- [Mendelian randomization](mendelian_family.md) — summary-data MR reuses
the same inverse-variance machinery for genetic instruments.
2 changes: 1 addition & 1 deletion docs/reference/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# API Reference — Overview

StatsPAI exposes 1,020 registered public functions under a single
StatsPAI exposes 1,022 registered public functions under a single
`import statspai as sp` namespace. Reference pages are grouped by
methodological area:

Expand Down
8 changes: 4 additions & 4 deletions docs/stats.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@

| Ecosystem / Project | Method | Files | Lines of code | Primary focus |
| ------------------------------------ | ---------- | -----: | ------------: | ---------------------------------- |
| **StatsPAI** `src/statspai/` | measured | 647 | **269,043** | validation-tiered causal inference |
| StatsPAI tests (`tests/`) | measured | 529 | 96,514 | — |
| **StatsPAI** `src/statspai/` | measured | 648 | **269,360** | validation-tiered causal inference |
| StatsPAI tests (`tests/`) | measured | 530 | 96,592 | — |
| statsmodels 0.14.x | measured | 948 | **381,981** | GLM / time series / general |
| linearmodels | measured | 131 | 36,607 | panel / IV |
| **Python causal-inference subtotal** | | 1,079 | **418,588** | |
Expand Down Expand Up @@ -53,8 +53,8 @@ Sorted by LOC. This table is generated from the live source tree by `python scri
| `plots` | 5,176 | 6 | 6 |
| `spatial` | 5,136 | 29 | 35 |
| `core` | 4,951 | 10 | 2 |
| `inference` | 4,941 | 15 | 24 |
| `panel` | 4,914 | 12 | 17 |
| `inference` | 4,629 | 14 | 22 |
| `matching` | 4,052 | 9 | 23 |
| `frontier` | 4,008 | 8 | 12 |
| `workflow` | 3,890 | 5 | 1 |
Expand Down Expand Up @@ -120,7 +120,7 @@ Sorted by LOC. This table is generated from the live source tree by `python scri
| `censoring` | 284 | 2 | 2 |
| `causal` | 101 | 1 | 0 |
| `schemas` | 0 | 0 | 0 |
| **Total** | **269,043** | **647** | **1020** |
| **Total** | **269,360** | **648** | **1022** |

## 3 · Causal-inference coverage matrix (full)

Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ nav:
- JOSS validation dossier: joss_validation_dossier.md
- JSS source-audit dossier: jss_source_audit_dossier.md
- Guides:
- "Meta-analysis (fixed/random-effects, heterogeneity, Egger, forest/funnel)": guides/meta_analysis.md
- "Staggered DiD (Callaway–Sant'Anna)": guides/callaway_santanna.md
- "Report card: cs_report()": guides/cs_report.md
- "Repeated cross-sections": guides/repeated_cross_sections.md
Expand Down
30 changes: 30 additions & 0 deletions paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,36 @@ @article{chernozhukov2018double
doi={10.1111/ectj.12097}
}

@article{dersimonian1986meta,
title={Meta-analysis in clinical trials},
author={DerSimonian, Rebecca and Laird, Nan},
journal={Controlled Clinical Trials},
volume={7},
number={3},
pages={177--188},
year={1986}
}

@article{higgins2002quantifying,
title={Quantifying heterogeneity in a meta-analysis},
author={Higgins, Julian P. T. and Thompson, Simon G.},
journal={Statistics in Medicine},
volume={21},
number={11},
pages={1539--1558},
year={2002}
}

@article{egger1997bias,
title={Bias in meta-analysis detected by a simple, graphical test},
author={Egger, Matthias and Davey Smith, George and Schneider, Martin and Minder, Christoph},
journal={BMJ},
volume={315},
number={7109},
pages={629--634},
year={1997}
}

@article{wager2018estimation,
title={Estimation and inference of heterogeneous treatment effects using random forests},
author={Wager, Stefan and Athey, Susan},
Expand Down
167 changes: 167 additions & 0 deletions schemas/functions.json
Original file line number Diff line number Diff line change
Expand Up @@ -22757,6 +22757,173 @@
"type": "object"
}
},
{
"description": "Summary-data meta-analysis with fixed- and random-effects pooling.",
"name": "meta_analysis",
"parameters": {
"properties": {
"alpha": {
"default": 0.05,
"description": "Significance level for confidence/prediction intervals.",
"type": "number"
},
"effects": {
"description": "Per-study effect sizes (e.g. log odds ratios, mean differences).",
"type": "number"
},
"labels": {
"description": "Study labels for the forest plot.",
"items": {
"type": "string"
},
"type": "array"
},
"method": {
"default": "DL",
"description": "Which model the headline ``estimate`` reports: DerSimonian-Laird random effects (default) or fixed-effect inverse-variance. Both are always computed and available on the result.",
"enum": [
"DL",
"fixed"
],
"type": "string"
},
"se": {
"description": "Per-study standard errors (must be positive).",
"type": "number"
}
},
"required": [
"effects",
"se"
],
"type": "object"
}
},
{
"description": "Result of a summary-data meta-analysis.",
"name": "MetaAnalysisResult",
"parameters": {
"properties": {
"alpha": {
"default": 0.05,
"description": "Significance level for confidence intervals and tests.",
"type": "number"
},
"ci": {
"description": "ci parameter (tuple).",
"items": {
"type": "string"
},
"type": "array"
},
"effects": {
"description": "effects parameter (np.ndarray).",
"type": "string"
},
"estimate": {
"description": "estimate parameter (float).",
"type": "number"
},
"fixed_estimate": {
"description": "fixed_estimate parameter (float).",
"type": "number"
},
"fixed_se": {
"description": "fixed_se parameter (float).",
"type": "number"
},
"h2": {
"description": "h2 parameter (float).",
"type": "number"
},
"i2": {
"description": "i2 parameter (float).",
"type": "number"
},
"labels": {
"description": "labels parameter (List[str]).",
"items": {
"type": "string"
},
"type": "array"
},
"method": {
"description": "Estimator or algorithm variant to use.",
"type": "string"
},
"p_value": {
"description": "p_value parameter (float).",
"type": "number"
},
"prediction_interval": {
"description": "prediction_interval parameter (Optional[tuple]).",
"items": {
"type": "string"
},
"type": "array"
},
"q": {
"description": "Quantile level.",
"type": "number"
},
"q_df": {
"description": "q_df parameter (int).",
"type": "integer"
},
"q_pvalue": {
"description": "q_pvalue parameter (float).",
"type": "number"
},
"random_estimate": {
"description": "random_estimate parameter (float).",
"type": "number"
},
"random_se": {
"description": "random_se parameter (float).",
"type": "number"
},
"se": {
"description": "se parameter (float).",
"type": "number"
},
"se_studies": {
"description": "se_studies parameter (np.ndarray).",
"type": "string"
},
"tau2": {
"description": "tau2 parameter (float).",
"type": "number"
},
"weights": {
"description": "Observation weights.",
"type": "string"
}
},
"required": [
"estimate",
"se",
"ci",
"p_value",
"method",
"fixed_estimate",
"fixed_se",
"random_estimate",
"random_se",
"tau2",
"q",
"q_df",
"q_pvalue",
"i2",
"h2",
"prediction_interval",
"weights",
"effects",
"se_studies",
"labels"
],
"type": "object"
}
},
{
"description": "Class wrapper around :func:`msm` for programmatic access.",
"name": "MarginalStructuralModel",
Expand Down
4 changes: 2 additions & 2 deletions schemas/index.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
{
"counts": {
"agent_cards": 362,
"functions": 1020,
"tools": 484
"functions": 1022,
"tools": 485
},
"files": [
"tools.json",
Expand Down
Loading
Loading