Test high-vol cohorts: thesis dies in all 3 cohorts including high-vol

张靖恒 · claude · 张靖恒 · commit 88f338d6ca72 · 2026-05-15T15:28:07.000+08:00
Per user's "测高交易量做市" follow-up: extended v4 + multi-window to test
maker basket arb on high-volume cohorts, not just long-tail D-vs-R.

Three cohort tests now done, all with maker_fee=0 + 10/4 IS/OOS split
on 4 non-overlapping 14-day windows:

  Cohort                       n groups  Naive OOS mean  Persistent winners
  ---------------------------  --------  --------------  -----------------
  Long-tail D-vs-R             71        -$183           0 of 64
  High-vol multi-member        50        STRUCTURAL FAIL  -
  High-vol binary (size=2)     6         -$22            0 of 3

The high-vol multi-member result is the most informative *new* finding:
96% of (group, markup) combinations got skipped via
"spread_too_narrow_for_maker". Per-leg spreads on high-vol multi-member
markets are 0.001-0.008 typically; long-shot legs are 1-tick wide
(spread=0.001). The "all legs filled as basket" strategy is
structurally impossible — you cannot place a valid maker quote on the
long-shot legs, and any narrow leg kills the whole basket.

Translation: high-vol markets are dominated by HFT market makers
leaving no room for slower-finger maker arb on the basket.

High-vol binary (size=2) escapes the structural problem but produces
all-negative or zero OOS across 4 windows. Only 6 such groups exist
today at vol&gt;=$5k (mostly D/R + a few sports), and 3 of them rotate
between windows.

Files:
  scripts/build_negrisk_cohort.py             - generic cohort builder
                                                with fixed Gamma pagination
                                                (offset += 100 not 500)
  scripts/simulate_maker_basket_v4.py         - now accepts --cohort-file
                                                and --cohort-tier
  scripts/aggregate_v4_multi_window.py        - now accepts --report-tag
                                                so we can aggregate
                                                multiple cohorts cleanly
  reports/maker-simulation-v4-*-binhv-w-*.md  - 4 binary high-vol windows
  reports/maker-simulation-v4-*-highvol-w-today.md - structural-fail report
  reports/maker-simulation-v4-multi-window-*-binary-highvol.md - binary HV agg
  reports/maker-simulation-v4-multi-window-*-longtail-dvr.md   - renamed
                                                                  from the
                                                                  untagged
                                                                  version

Verdict: maker basket arb thesis is dead in all 3 testable cohorts.
The fee correction was a real bug (and could matter in other contexts),
but did not save the thesis under rigorous multi-window testing.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/reports/maker-simulation-v4-2026-05-15-binhv-w-14d-ago.md b/reports/maker-simulation-v4-2026-05-15-binhv-w-14d-ago.md
@@ -0,0 +1,55 @@
+# Maker Simulation v4 — Corrected fee + train/test split (2026-05-15T07:25:00.807506+00:00)
+
+**Method**: v3 plumbing + two fixes:
+  (A) maker fee mode = `zero` (v3 used taker_rate, which was wrong; Polymarket docs: "makers never pay fees")
+  (B) 10-day in-sample / 4-day OOS split. Top 6 groups picked by IN-SAMPLE daily $; their OOS sum reported separately.
+
+**Window**: 14 days (2026-04-17 -> 2026-05-15)
+**Basket size cap**: $100
+**Trades fetched**: 17232 raw -> 1487 qualifying
+**Days with trade activity**: in-sample 10, OOS 4
+**takerOnly distribution across our markets**: {True: 12}
+
+## Headline (with maker fee = 0)
+
+| Verdict | Daily $ | Annualized |
+|---|---:|---:|
+| Naive (all 6 groups), in-sample | $-0.03 | $-12 |
+| Naive (all 6 groups), OOS | $+0.00 | $+0 |
+| Whole window (no split) | $-0.02 | $-9 |
+| **Top 6 by in-sample, in-sample** | $-0.03 | $-12 |
+| **Top 6 by in-sample, OOS** ← honest verdict | $+0.00 | $+0 |
+
+If top-N OOS << top-N in-sample, the top-N looks like overfitting.
+OOS / in-sample ratio for top-6: -0.00
+
+## Top 6 groups — in-sample picked, OOS measured
+
+| Rank | Group | Q | Best markup | IS daily $ | OOS daily $ | OOS/IS |
+|---:|---|---|---:|---:|---:|---:|
+| 1 | `0x22725f09e6a3...` | Will the Democratic Party cont vs Will the | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 2 | `0xd4ec843b5228...` | Will the Democratic Party cont vs Will the | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 3 | `0xdc4bd1724b69...` | Will the Republicans win the 2 vs Will the | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 4 | `0x7e28615c2891...` | Will Arsenal win the 2025–26 E vs Will Man | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 5 | `0xd898209c4efc...` | Will Jeff Merkley be the Democ vs Will Jac | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 6 | `0xb37eb81b8e7c...` | Will Chelsea win the 2025-2026 vs Will Man | $0.010 | $-0.034 | $+0.000 | -0.00 |
+
+## Compared to prior versions
+
+| Version | Method | Annualized | Issue |
+|---|---|---:|---|
+| v1 (mid-touch) | mid touch as fill proxy | $15,546 | mid touching != fill |
+| v2 size-uncapped | sum of all SELL Yes | $918 | income computed at $100/fill regardless of trade size |
+| v3 size-capped, taker fee | size cap added | -$263 naive / +$117 cherry-pick | maker fee wrongly = taker fee; no OOS check |
+| **v4 this run** | size cap + maker_fee=zero + IS/OOS | $+0 OOS naive / $+0 OOS top-6 | fee per docs; cherry-pick now measured out-of-sample |
+
+## Caveats (still standing)
+
+- Queue priority: assumes we are first in line at our maker price level.
+- Per-leg fills assumed independent within a day.
+- Maker fee = 0 ignores `rebateRate` (20-25% of pool taker fees redistributed to makers). Real maker income could be modestly HIGHER. Conservative direction.
+- 14 days is a short window; the in-sample / OOS split is *one* random partition, not k-fold. Repeat with different splits to test stability.
+- Today's bestAsk/bestBid used to compute maker target — historical spread may have differed.
+
+---
+*Snapshot: 2026-05-15T07:25:00.807506+00:00*
diff --git a/reports/maker-simulation-v4-2026-05-15-binhv-w-28d-ago.md b/reports/maker-simulation-v4-2026-05-15-binhv-w-28d-ago.md
@@ -0,0 +1,55 @@
+# Maker Simulation v4 — Corrected fee + train/test split (2026-05-15T07:25:12.441018+00:00)
+
+**Method**: v3 plumbing + two fixes:
+  (A) maker fee mode = `zero` (v3 used taker_rate, which was wrong; Polymarket docs: "makers never pay fees")
+  (B) 10-day in-sample / 4-day OOS split. Top 6 groups picked by IN-SAMPLE daily $; their OOS sum reported separately.
+
+**Window**: 14 days (2026-04-03 -> 2026-05-15)
+**Basket size cap**: $100
+**Trades fetched**: 21732 raw -> 959 qualifying
+**Days with trade activity**: in-sample 10, OOS 4
+**takerOnly distribution across our markets**: {True: 12}
+
+## Headline (with maker fee = 0)
+
+| Verdict | Daily $ | Annualized |
+|---|---:|---:|
+| Naive (all 6 groups), in-sample | $-0.04 | $-15 |
+| Naive (all 6 groups), OOS | $-0.09 | $-32 |
+| Whole window (no split) | $-0.05 | $-20 |
+| **Top 6 by in-sample, in-sample** | $-0.04 | $-15 |
+| **Top 6 by in-sample, OOS** ← honest verdict | $-0.09 | $-32 |
+
+If top-N OOS << top-N in-sample, the top-N looks like overfitting.
+OOS / in-sample ratio for top-6: 2.16
+
+## Top 6 groups — in-sample picked, OOS measured
+
+| Rank | Group | Q | Best markup | IS daily $ | OOS daily $ | OOS/IS |
+|---:|---|---|---:|---:|---:|---:|
+| 1 | `0x22725f09e6a3...` | Will the Democratic Party cont vs Will the | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 2 | `0xd4ec843b5228...` | Will the Democratic Party cont vs Will the | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 3 | `0xdc4bd1724b69...` | Will the Republicans win the 2 vs Will the | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 4 | `0x7e28615c2891...` | Will Arsenal win the 2025–26 E vs Will Man | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 5 | `0xd898209c4efc...` | Will Jeff Merkley be the Democ vs Will Jac | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 6 | `0xb37eb81b8e7c...` | Will Chelsea win the 2025-2026 vs Will Man | $0.010 | $-0.040 | $-0.087 | +2.16 |
+
+## Compared to prior versions
+
+| Version | Method | Annualized | Issue |
+|---|---|---:|---|
+| v1 (mid-touch) | mid touch as fill proxy | $15,546 | mid touching != fill |
+| v2 size-uncapped | sum of all SELL Yes | $918 | income computed at $100/fill regardless of trade size |
+| v3 size-capped, taker fee | size cap added | -$263 naive / +$117 cherry-pick | maker fee wrongly = taker fee; no OOS check |
+| **v4 this run** | size cap + maker_fee=zero + IS/OOS | $-32 OOS naive / $-32 OOS top-6 | fee per docs; cherry-pick now measured out-of-sample |
+
+## Caveats (still standing)
+
+- Queue priority: assumes we are first in line at our maker price level.
+- Per-leg fills assumed independent within a day.
+- Maker fee = 0 ignores `rebateRate` (20-25% of pool taker fees redistributed to makers). Real maker income could be modestly HIGHER. Conservative direction.
+- 14 days is a short window; the in-sample / OOS split is *one* random partition, not k-fold. Repeat with different splits to test stability.
+- Today's bestAsk/bestBid used to compute maker target — historical spread may have differed.
+
+---
+*Snapshot: 2026-05-15T07:25:12.441018+00:00*
diff --git a/reports/maker-simulation-v4-2026-05-15-binhv-w-42d-ago.md b/reports/maker-simulation-v4-2026-05-15-binhv-w-42d-ago.md
@@ -0,0 +1,54 @@
+# Maker Simulation v4 — Corrected fee + train/test split (2026-05-15T07:25:38.929870+00:00)
+
+**Method**: v3 plumbing + two fixes:
+  (A) maker fee mode = `zero` (v3 used taker_rate, which was wrong; Polymarket docs: "makers never pay fees")
+  (B) 10-day in-sample / 4-day OOS split. Top 6 groups picked by IN-SAMPLE daily $; their OOS sum reported separately.
+
+**Window**: 14 days (2026-03-20 -> 2026-05-15)
+**Basket size cap**: $100
+**Trades fetched**: 13732 raw -> 158 qualifying
+**Days with trade activity**: in-sample 10, OOS 4
+**takerOnly distribution across our markets**: {True: 11}
+
+## Headline (with maker fee = 0)
+
+| Verdict | Daily $ | Annualized |
+|---|---:|---:|
+| Naive (all 5 groups), in-sample | $-0.00 | $-1 |
+| Naive (all 5 groups), OOS | $+0.00 | $+0 |
+| Whole window (no split) | $-0.00 | $-0 |
+| **Top 6 by in-sample, in-sample** | $-0.00 | $-1 |
+| **Top 6 by in-sample, OOS** ← honest verdict | $+0.00 | $+0 |
+
+If top-N OOS << top-N in-sample, the top-N looks like overfitting.
+OOS / in-sample ratio for top-6: -0.00
+
+## Top 6 groups — in-sample picked, OOS measured
+
+| Rank | Group | Q | Best markup | IS daily $ | OOS daily $ | OOS/IS |
+|---:|---|---|---:|---:|---:|---:|
+| 1 | `0xd4ec843b5228...` | Will the Democratic Party cont vs Will the | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 2 | `0xdc4bd1724b69...` | Will the Republicans win the 2 vs Will the | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 3 | `0x7e28615c2891...` | Will Arsenal win the 2025–26 E vs Will Man | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 4 | `0xd898209c4efc...` | Will Jeff Merkley be the Democ vs Will Jac | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 5 | `0xb37eb81b8e7c...` | Will Chelsea win the 2025-2026 vs Will Man | $0.010 | $-0.002 | $+0.000 | -0.00 |
+
+## Compared to prior versions
+
+| Version | Method | Annualized | Issue |
+|---|---|---:|---|
+| v1 (mid-touch) | mid touch as fill proxy | $15,546 | mid touching != fill |
+| v2 size-uncapped | sum of all SELL Yes | $918 | income computed at $100/fill regardless of trade size |
+| v3 size-capped, taker fee | size cap added | -$263 naive / +$117 cherry-pick | maker fee wrongly = taker fee; no OOS check |
+| **v4 this run** | size cap + maker_fee=zero + IS/OOS | $+0 OOS naive / $+0 OOS top-6 | fee per docs; cherry-pick now measured out-of-sample |
+
+## Caveats (still standing)
+
+- Queue priority: assumes we are first in line at our maker price level.
+- Per-leg fills assumed independent within a day.
+- Maker fee = 0 ignores `rebateRate` (20-25% of pool taker fees redistributed to makers). Real maker income could be modestly HIGHER. Conservative direction.
+- 14 days is a short window; the in-sample / OOS split is *one* random partition, not k-fold. Repeat with different splits to test stability.
+- Today's bestAsk/bestBid used to compute maker target — historical spread may have differed.
+
+---
+*Snapshot: 2026-05-15T07:25:38.929870+00:00*
diff --git a/reports/maker-simulation-v4-2026-05-15-binhv-w-today.md b/reports/maker-simulation-v4-2026-05-15-binhv-w-today.md
@@ -0,0 +1,52 @@
+# Maker Simulation v4 — Corrected fee + train/test split (2026-05-15T07:24:50.874158+00:00)
+
+**Method**: v3 plumbing + two fixes:
+  (A) maker fee mode = `zero` (v3 used taker_rate, which was wrong; Polymarket docs: "makers never pay fees")
+  (B) 10-day in-sample / 4-day OOS split. Top 6 groups picked by IN-SAMPLE daily $; their OOS sum reported separately.
+
+**Window**: 14 days (2026-05-01 -> 2026-05-15)
+**Basket size cap**: $100
+**Trades fetched**: 7232 raw -> 1510 qualifying
+**Days with trade activity**: in-sample 11, OOS 4
+**takerOnly distribution across our markets**: {True: 7}
+
+## Headline (with maker fee = 0)
+
+| Verdict | Daily $ | Annualized |
+|---|---:|---:|
+| Naive (all 3 groups), in-sample | $-0.05 | $-20 |
+| Naive (all 3 groups), OOS | $-0.15 | $-55 |
+| Whole window (no split) | $-0.08 | $-29 |
+| **Top 6 by in-sample, in-sample** | $-0.05 | $-20 |
+| **Top 6 by in-sample, OOS** ← honest verdict | $-0.15 | $-55 |
+
+If top-N OOS << top-N in-sample, the top-N looks like overfitting.
+OOS / in-sample ratio for top-6: 2.75
+
+## Top 6 groups — in-sample picked, OOS measured
+
+| Rank | Group | Q | Best markup | IS daily $ | OOS daily $ | OOS/IS |
+|---:|---|---|---:|---:|---:|---:|
+| 1 | `0xd898209c4efc...` | Will Jeff Merkley be the Democ vs Will Jac | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 2 | `0xb37eb81b8e7c...` | Will Chelsea win the 2025-2026 vs Will Man | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 3 | `0x7e28615c2891...` | Will Arsenal win the 2025–26 E vs Will Man | $0.010 | $-0.055 | $-0.150 | +2.75 |
+
+## Compared to prior versions
+
+| Version | Method | Annualized | Issue |
+|---|---|---:|---|
+| v1 (mid-touch) | mid touch as fill proxy | $15,546 | mid touching != fill |
+| v2 size-uncapped | sum of all SELL Yes | $918 | income computed at $100/fill regardless of trade size |
+| v3 size-capped, taker fee | size cap added | -$263 naive / +$117 cherry-pick | maker fee wrongly = taker fee; no OOS check |
+| **v4 this run** | size cap + maker_fee=zero + IS/OOS | $-55 OOS naive / $-55 OOS top-6 | fee per docs; cherry-pick now measured out-of-sample |
+
+## Caveats (still standing)
+
+- Queue priority: assumes we are first in line at our maker price level.
+- Per-leg fills assumed independent within a day.
+- Maker fee = 0 ignores `rebateRate` (20-25% of pool taker fees redistributed to makers). Real maker income could be modestly HIGHER. Conservative direction.
+- 14 days is a short window; the in-sample / OOS split is *one* random partition, not k-fold. Repeat with different splits to test stability.
+- Today's bestAsk/bestBid used to compute maker target — historical spread may have differed.
+
+---
+*Snapshot: 2026-05-15T07:24:50.874158+00:00*
diff --git a/reports/maker-simulation-v4-2026-05-15-highvol-w-today.md b/reports/maker-simulation-v4-2026-05-15-highvol-w-today.md
@@ -0,0 +1,67 @@
+# Maker Simulation v4 — Corrected fee + train/test split (2026-05-15T07:19:16.547712+00:00)
+
+**Method**: v3 plumbing + two fixes:
+  (A) maker fee mode = `zero` (v3 used taker_rate, which was wrong; Polymarket docs: "makers never pay fees")
+  (B) 10-day in-sample / 4-day OOS split. Top 18 groups picked by IN-SAMPLE daily $; their OOS sum reported separately.
+
+**Window**: 14 days (2026-05-01 -> 2026-05-15)
+**Basket size cap**: $100
+**Trades fetched**: 1022715 raw -> 104389 qualifying
+**Days with trade activity**: in-sample 11, OOS 4
+**takerOnly distribution across our markets**: {True: 738}
+
+## Headline (with maker fee = 0)
+
+| Verdict | Daily $ | Annualized |
+|---|---:|---:|
+| Naive (all 50 groups), in-sample | $+0.00 | $+0 |
+| Naive (all 50 groups), OOS | $+0.00 | $+0 |
+| Whole window (no split) | $+0.00 | $+0 |
+| **Top 18 by in-sample, in-sample** | $+0.00 | $+0 |
+| **Top 18 by in-sample, OOS** ← honest verdict | $+0.00 | $+0 |
+
+If top-N OOS << top-N in-sample, the top-N looks like overfitting.
+OOS / in-sample ratio for top-18: 0.00
+
+## Top 18 groups — in-sample picked, OOS measured
+
+| Rank | Group | Q | Best markup | IS daily $ | OOS daily $ | OOS/IS |
+|---:|---|---|---:|---:|---:|---:|
+| 1 | `0x7faa974ff857...` | Will the Carolina Hurricanes w vs Will the | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 2 | `0x11e9a09023ac...` | Will the Oklahoma City Thunder vs Will the | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 3 | `0xb5c32a9acd39...` | Will Spain win the 2026 FIFA W vs Will Eng | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 4 | `0x2c3d7e0eee6f...` | Will Gavin Newsom win the 2028 vs Will Ale | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 5 | `0xb9aa4595bbe8...` | Will JD Vance win the 2028 US  vs Will Gav | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 6 | `0xc7d902c4f18f...` | Will Donald Trump win the 2028 vs Will J.D | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 7 | `0x0aa99409c83e...` | Will Ken Paxton win the 2026 T vs Will Joh | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 8 | `0x7b95a46fc059...` | 2026 Balance of Power: D Senat vs 2026 Bal | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 9 | `0x51a3b9f29275...` | Will Nikola Jokic win the 2025 vs Will Sha | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 10 | `0x8a612f242b38...` | Will the Oklahoma City Thunder vs Will the | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 11 | `0xc8f80ae8e6e9...` | Will PSG win the 2025–26 Champ vs Will Ars | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 12 | `0x9250593bd8a2...` | Will Vicky Dávila win the 1st  vs Will Lui | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 13 | `0x3e140cadcdda...` | Will Vicky Dávila win the 2026 vs Will Lui | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 14 | `0xf5df3e53ea40...` | Will Mallory McMorrow win the  vs Will Hal | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 15 | `0x966a3221e05d...` | Will Tarcisio de Freitas win t vs Will Lui | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 16 | `0xda59e733de4f...` | Will Kylian Mbappe be the 2025 vs Will Ous | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 17 | `0xcd778cf07b7b...` | Will Anthropic’s market cap be vs Will Ant | $0.005 | $+0.000 | $+0.000 | +0.00 |
+| 18 | `0x178e5d521fe4...` | Will Kylian Mbappé win the 202 vs Will Erl | $0.005 | $+0.000 | $+0.000 | +0.00 |
+
+## Compared to prior versions
+
+| Version | Method | Annualized | Issue |
+|---|---|---:|---|
+| v1 (mid-touch) | mid touch as fill proxy | $15,546 | mid touching != fill |
+| v2 size-uncapped | sum of all SELL Yes | $918 | income computed at $100/fill regardless of trade size |
+| v3 size-capped, taker fee | size cap added | -$263 naive / +$117 cherry-pick | maker fee wrongly = taker fee; no OOS check |
+| **v4 this run** | size cap + maker_fee=zero + IS/OOS | $+0 OOS naive / $+0 OOS top-18 | fee per docs; cherry-pick now measured out-of-sample |
+
+## Caveats (still standing)
+
+- Queue priority: assumes we are first in line at our maker price level.
+- Per-leg fills assumed independent within a day.
+- Maker fee = 0 ignores `rebateRate` (20-25% of pool taker fees redistributed to makers). Real maker income could be modestly HIGHER. Conservative direction.
+- 14 days is a short window; the in-sample / OOS split is *one* random partition, not k-fold. Repeat with different splits to test stability.
+- Today's bestAsk/bestBid used to compute maker target — historical spread may have differed.
+
+---
+*Snapshot: 2026-05-15T07:19:16.547712+00:00*
diff --git a/reports/maker-simulation-v4-multi-window-2026-05-15-binary-highvol.md b/reports/maker-simulation-v4-multi-window-2026-05-15-binary-highvol.md
@@ -0,0 +1,42 @@
+# Maker Simulation v4 — Multi-Window Stability (2026-05-15T07:26:39.204878+00:00)
+
+**Method**: ran v4 (maker_fee=zero, 10 IS / 4 OOS) on 4 non-overlapping 14-day windows. Total span: 2026-03-20 -> 2026-05-15 (56 days).
+
+**Why this matters**: a single 14-day window's verdict can be window-luck. If the thesis is real, all 4 windows should give roughly consistent signs and magnitudes. If they bounce sign or order of magnitude, the single-window verdict was a coincidence.
+
+## Per-window numbers
+
+| Window | IS days | OOS days | Naive IS / yr | Naive OOS / yr | Cherry IS / yr | Cherry OOS / yr |
+|---|---:|---:|---:|---:|---:|---:|
+| 2026-03-20 → 2026-04-03 | 10 | 4 | $-1 | $+0 | $-1 | $+0 |
+| 2026-04-03 → 2026-04-17 | 10 | 4 | $-15 | $-32 | $-15 | $-32 |
+| 2026-04-17 → 2026-05-01 | 10 | 4 | $-12 | $+0 | $-12 | $+0 |
+| 2026-05-01 → 2026-05-15 | 10 | 4 | $-20 | $-55 | $-20 | $-55 |
+
+## Cross-window stability
+
+| Metric | mean | median | min | max | SD | SD/mean |
+|---|---:|---:|---:|---:|---:|---:|
+| Naive IS / yr | $-12 | $-14 | $-20 | $-1 | $8 | 0.69 |
+| Naive OOS / yr | $-22 | $-16 | $-55 | $+0 | $27 | 1.23 |
+| Cherry IS / yr | $-12 | $-14 | $-20 | $-1 | $8 | 0.69 |
+| Cherry OOS / yr | $-22 | $-16 | $-55 | $+0 | $27 | 1.23 |
+
+**Read this**: if SD/mean > 1.0, your point estimate is mostly noise. If sign of OOS is consistent across windows but magnitude varies 2-3x, you have a real but noisy signal.
+
+## Persistent winners (positive OOS in ≥3 of 4 windows)
+
+Found 0 groups (of 3 present in all windows). Sum of their mean OOS = **$+0/yr**.
+
+| Rank | Group | Q | +OOS windows | Mean OOS / yr | Median OOS / yr | Values per window |
+|---:|---|---|:-:|---:|---:|---|
+
+## Interpretation
+
+- The single-window verdict from any one run alone is statistically weak.
+- The honest verdict is the mean OOS across all windows.
+- The set of **persistent winners** (+OOS in ≥3/4 windows) gives the most defensible cherry-pick.
+- If `naive_oos` flips sign across windows, the thesis applies only to a subset of groups, not to a naive deploy.
+
+---
+*Snapshot: 2026-05-15T07:26:39.204878+00:00*
diff --git a/reports/maker-simulation-v4-multi-window-2026-05-15-longtail-dvr.md b/reports/maker-simulation-v4-multi-window-2026-05-15-longtail-dvr.md
@@ -1,4 +1,4 @@
-# Maker Simulation v4 — Multi-Window Stability (2026-05-15T04:05:23.269350+00:00)
+# Maker Simulation v4 — Multi-Window Stability (2026-05-15T07:26:38.997558+00:00)
 
 **Method**: ran v4 (maker_fee=zero, 10 IS / 4 OOS) on 4 non-overlapping 14-day windows. Total span: 2026-03-20 -> 2026-05-15 (56 days).
 
@@ -39,4 +39,4 @@ Found 0 groups (of 64 present in all windows). Sum of their mean OOS = **$+0/yr*
 - If `naive_oos` flips sign across windows, the thesis applies only to a subset of groups, not to a naive deploy.
 
 ---
-*Snapshot: 2026-05-15T04:05:23.269350+00:00*
+*Snapshot: 2026-05-15T07:26:38.997558+00:00*
diff --git a/scripts/aggregate_v4_multi_window.py b/scripts/aggregate_v4_multi_window.py
diff --git a/scripts/build_negrisk_cohort.py b/scripts/build_negrisk_cohort.py
diff --git a/scripts/simulate_maker_basket_v4.py b/scripts/simulate_maker_basket_v4.py