Skip to content

Commit 88f338d

Browse files
张靖恒claude
andcommitted
Test high-vol cohorts: thesis dies in all 3 cohorts including high-vol
Per user's "测高交易量做市" follow-up: extended v4 + multi-window to test maker basket arb on high-volume cohorts, not just long-tail D-vs-R. Three cohort tests now done, all with maker_fee=0 + 10/4 IS/OOS split on 4 non-overlapping 14-day windows: Cohort n groups Naive OOS mean Persistent winners --------------------------- -------- -------------- ----------------- Long-tail D-vs-R 71 -$183 0 of 64 High-vol multi-member 50 STRUCTURAL FAIL - High-vol binary (size=2) 6 -$22 0 of 3 The high-vol multi-member result is the most informative *new* finding: 96% of (group, markup) combinations got skipped via "spread_too_narrow_for_maker". Per-leg spreads on high-vol multi-member markets are 0.001-0.008 typically; long-shot legs are 1-tick wide (spread=0.001). The "all legs filled as basket" strategy is structurally impossible — you cannot place a valid maker quote on the long-shot legs, and any narrow leg kills the whole basket. Translation: high-vol markets are dominated by HFT market makers leaving no room for slower-finger maker arb on the basket. High-vol binary (size=2) escapes the structural problem but produces all-negative or zero OOS across 4 windows. Only 6 such groups exist today at vol>=$5k (mostly D/R + a few sports), and 3 of them rotate between windows. Files: scripts/build_negrisk_cohort.py - generic cohort builder with fixed Gamma pagination (offset += 100 not 500) scripts/simulate_maker_basket_v4.py - now accepts --cohort-file and --cohort-tier scripts/aggregate_v4_multi_window.py - now accepts --report-tag so we can aggregate multiple cohorts cleanly reports/maker-simulation-v4-*-binhv-w-*.md - 4 binary high-vol windows reports/maker-simulation-v4-*-highvol-w-today.md - structural-fail report reports/maker-simulation-v4-multi-window-*-binary-highvol.md - binary HV agg reports/maker-simulation-v4-multi-window-*-longtail-dvr.md - renamed from the untagged version Verdict: maker basket arb thesis is dead in all 3 testable cohorts. The fee correction was a real bug (and could matter in other contexts), but did not save the thesis under rigorous multi-window testing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 183cb1a commit 88f338d

10 files changed

Lines changed: 550 additions & 15 deletions
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# Maker Simulation v4 — Corrected fee + train/test split (2026-05-15T07:25:00.807506+00:00)
2+
3+
**Method**: v3 plumbing + two fixes:
4+
(A) maker fee mode = `zero` (v3 used taker_rate, which was wrong; Polymarket docs: "makers never pay fees")
5+
(B) 10-day in-sample / 4-day OOS split. Top 6 groups picked by IN-SAMPLE daily $; their OOS sum reported separately.
6+
7+
**Window**: 14 days (2026-04-17 -> 2026-05-15)
8+
**Basket size cap**: $100
9+
**Trades fetched**: 17232 raw -> 1487 qualifying
10+
**Days with trade activity**: in-sample 10, OOS 4
11+
**takerOnly distribution across our markets**: {True: 12}
12+
13+
## Headline (with maker fee = 0)
14+
15+
| Verdict | Daily $ | Annualized |
16+
|---|---:|---:|
17+
| Naive (all 6 groups), in-sample | $-0.03 | $-12 |
18+
| Naive (all 6 groups), OOS | $+0.00 | $+0 |
19+
| Whole window (no split) | $-0.02 | $-9 |
20+
| **Top 6 by in-sample, in-sample** | $-0.03 | $-12 |
21+
| **Top 6 by in-sample, OOS** ← honest verdict | $+0.00 | $+0 |
22+
23+
If top-N OOS << top-N in-sample, the top-N looks like overfitting.
24+
OOS / in-sample ratio for top-6: -0.00
25+
26+
## Top 6 groups — in-sample picked, OOS measured
27+
28+
| Rank | Group | Q | Best markup | IS daily $ | OOS daily $ | OOS/IS |
29+
|---:|---|---|---:|---:|---:|---:|
30+
| 1 | `0x22725f09e6a3...` | Will the Democratic Party cont vs Will the | $0.005 | $+0.000 | $+0.000 | +0.00 |
31+
| 2 | `0xd4ec843b5228...` | Will the Democratic Party cont vs Will the | $0.005 | $+0.000 | $+0.000 | +0.00 |
32+
| 3 | `0xdc4bd1724b69...` | Will the Republicans win the 2 vs Will the | $0.005 | $+0.000 | $+0.000 | +0.00 |
33+
| 4 | `0x7e28615c2891...` | Will Arsenal win the 2025–26 E vs Will Man | $0.005 | $+0.000 | $+0.000 | +0.00 |
34+
| 5 | `0xd898209c4efc...` | Will Jeff Merkley be the Democ vs Will Jac | $0.005 | $+0.000 | $+0.000 | +0.00 |
35+
| 6 | `0xb37eb81b8e7c...` | Will Chelsea win the 2025-2026 vs Will Man | $0.010 | $-0.034 | $+0.000 | -0.00 |
36+
37+
## Compared to prior versions
38+
39+
| Version | Method | Annualized | Issue |
40+
|---|---|---:|---|
41+
| v1 (mid-touch) | mid touch as fill proxy | $15,546 | mid touching != fill |
42+
| v2 size-uncapped | sum of all SELL Yes | $918 | income computed at $100/fill regardless of trade size |
43+
| v3 size-capped, taker fee | size cap added | -$263 naive / +$117 cherry-pick | maker fee wrongly = taker fee; no OOS check |
44+
| **v4 this run** | size cap + maker_fee=zero + IS/OOS | $+0 OOS naive / $+0 OOS top-6 | fee per docs; cherry-pick now measured out-of-sample |
45+
46+
## Caveats (still standing)
47+
48+
- Queue priority: assumes we are first in line at our maker price level.
49+
- Per-leg fills assumed independent within a day.
50+
- Maker fee = 0 ignores `rebateRate` (20-25% of pool taker fees redistributed to makers). Real maker income could be modestly HIGHER. Conservative direction.
51+
- 14 days is a short window; the in-sample / OOS split is *one* random partition, not k-fold. Repeat with different splits to test stability.
52+
- Today's bestAsk/bestBid used to compute maker target — historical spread may have differed.
53+
54+
---
55+
*Snapshot: 2026-05-15T07:25:00.807506+00:00*
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# Maker Simulation v4 — Corrected fee + train/test split (2026-05-15T07:25:12.441018+00:00)
2+
3+
**Method**: v3 plumbing + two fixes:
4+
(A) maker fee mode = `zero` (v3 used taker_rate, which was wrong; Polymarket docs: "makers never pay fees")
5+
(B) 10-day in-sample / 4-day OOS split. Top 6 groups picked by IN-SAMPLE daily $; their OOS sum reported separately.
6+
7+
**Window**: 14 days (2026-04-03 -> 2026-05-15)
8+
**Basket size cap**: $100
9+
**Trades fetched**: 21732 raw -> 959 qualifying
10+
**Days with trade activity**: in-sample 10, OOS 4
11+
**takerOnly distribution across our markets**: {True: 12}
12+
13+
## Headline (with maker fee = 0)
14+
15+
| Verdict | Daily $ | Annualized |
16+
|---|---:|---:|
17+
| Naive (all 6 groups), in-sample | $-0.04 | $-15 |
18+
| Naive (all 6 groups), OOS | $-0.09 | $-32 |
19+
| Whole window (no split) | $-0.05 | $-20 |
20+
| **Top 6 by in-sample, in-sample** | $-0.04 | $-15 |
21+
| **Top 6 by in-sample, OOS** ← honest verdict | $-0.09 | $-32 |
22+
23+
If top-N OOS << top-N in-sample, the top-N looks like overfitting.
24+
OOS / in-sample ratio for top-6: 2.16
25+
26+
## Top 6 groups — in-sample picked, OOS measured
27+
28+
| Rank | Group | Q | Best markup | IS daily $ | OOS daily $ | OOS/IS |
29+
|---:|---|---|---:|---:|---:|---:|
30+
| 1 | `0x22725f09e6a3...` | Will the Democratic Party cont vs Will the | $0.005 | $+0.000 | $+0.000 | +0.00 |
31+
| 2 | `0xd4ec843b5228...` | Will the Democratic Party cont vs Will the | $0.005 | $+0.000 | $+0.000 | +0.00 |
32+
| 3 | `0xdc4bd1724b69...` | Will the Republicans win the 2 vs Will the | $0.005 | $+0.000 | $+0.000 | +0.00 |
33+
| 4 | `0x7e28615c2891...` | Will Arsenal win the 2025–26 E vs Will Man | $0.005 | $+0.000 | $+0.000 | +0.00 |
34+
| 5 | `0xd898209c4efc...` | Will Jeff Merkley be the Democ vs Will Jac | $0.005 | $+0.000 | $+0.000 | +0.00 |
35+
| 6 | `0xb37eb81b8e7c...` | Will Chelsea win the 2025-2026 vs Will Man | $0.010 | $-0.040 | $-0.087 | +2.16 |
36+
37+
## Compared to prior versions
38+
39+
| Version | Method | Annualized | Issue |
40+
|---|---|---:|---|
41+
| v1 (mid-touch) | mid touch as fill proxy | $15,546 | mid touching != fill |
42+
| v2 size-uncapped | sum of all SELL Yes | $918 | income computed at $100/fill regardless of trade size |
43+
| v3 size-capped, taker fee | size cap added | -$263 naive / +$117 cherry-pick | maker fee wrongly = taker fee; no OOS check |
44+
| **v4 this run** | size cap + maker_fee=zero + IS/OOS | $-32 OOS naive / $-32 OOS top-6 | fee per docs; cherry-pick now measured out-of-sample |
45+
46+
## Caveats (still standing)
47+
48+
- Queue priority: assumes we are first in line at our maker price level.
49+
- Per-leg fills assumed independent within a day.
50+
- Maker fee = 0 ignores `rebateRate` (20-25% of pool taker fees redistributed to makers). Real maker income could be modestly HIGHER. Conservative direction.
51+
- 14 days is a short window; the in-sample / OOS split is *one* random partition, not k-fold. Repeat with different splits to test stability.
52+
- Today's bestAsk/bestBid used to compute maker target — historical spread may have differed.
53+
54+
---
55+
*Snapshot: 2026-05-15T07:25:12.441018+00:00*
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Maker Simulation v4 — Corrected fee + train/test split (2026-05-15T07:25:38.929870+00:00)
2+
3+
**Method**: v3 plumbing + two fixes:
4+
(A) maker fee mode = `zero` (v3 used taker_rate, which was wrong; Polymarket docs: "makers never pay fees")
5+
(B) 10-day in-sample / 4-day OOS split. Top 6 groups picked by IN-SAMPLE daily $; their OOS sum reported separately.
6+
7+
**Window**: 14 days (2026-03-20 -> 2026-05-15)
8+
**Basket size cap**: $100
9+
**Trades fetched**: 13732 raw -> 158 qualifying
10+
**Days with trade activity**: in-sample 10, OOS 4
11+
**takerOnly distribution across our markets**: {True: 11}
12+
13+
## Headline (with maker fee = 0)
14+
15+
| Verdict | Daily $ | Annualized |
16+
|---|---:|---:|
17+
| Naive (all 5 groups), in-sample | $-0.00 | $-1 |
18+
| Naive (all 5 groups), OOS | $+0.00 | $+0 |
19+
| Whole window (no split) | $-0.00 | $-0 |
20+
| **Top 6 by in-sample, in-sample** | $-0.00 | $-1 |
21+
| **Top 6 by in-sample, OOS** ← honest verdict | $+0.00 | $+0 |
22+
23+
If top-N OOS << top-N in-sample, the top-N looks like overfitting.
24+
OOS / in-sample ratio for top-6: -0.00
25+
26+
## Top 6 groups — in-sample picked, OOS measured
27+
28+
| Rank | Group | Q | Best markup | IS daily $ | OOS daily $ | OOS/IS |
29+
|---:|---|---|---:|---:|---:|---:|
30+
| 1 | `0xd4ec843b5228...` | Will the Democratic Party cont vs Will the | $0.005 | $+0.000 | $+0.000 | +0.00 |
31+
| 2 | `0xdc4bd1724b69...` | Will the Republicans win the 2 vs Will the | $0.005 | $+0.000 | $+0.000 | +0.00 |
32+
| 3 | `0x7e28615c2891...` | Will Arsenal win the 2025–26 E vs Will Man | $0.005 | $+0.000 | $+0.000 | +0.00 |
33+
| 4 | `0xd898209c4efc...` | Will Jeff Merkley be the Democ vs Will Jac | $0.005 | $+0.000 | $+0.000 | +0.00 |
34+
| 5 | `0xb37eb81b8e7c...` | Will Chelsea win the 2025-2026 vs Will Man | $0.010 | $-0.002 | $+0.000 | -0.00 |
35+
36+
## Compared to prior versions
37+
38+
| Version | Method | Annualized | Issue |
39+
|---|---|---:|---|
40+
| v1 (mid-touch) | mid touch as fill proxy | $15,546 | mid touching != fill |
41+
| v2 size-uncapped | sum of all SELL Yes | $918 | income computed at $100/fill regardless of trade size |
42+
| v3 size-capped, taker fee | size cap added | -$263 naive / +$117 cherry-pick | maker fee wrongly = taker fee; no OOS check |
43+
| **v4 this run** | size cap + maker_fee=zero + IS/OOS | $+0 OOS naive / $+0 OOS top-6 | fee per docs; cherry-pick now measured out-of-sample |
44+
45+
## Caveats (still standing)
46+
47+
- Queue priority: assumes we are first in line at our maker price level.
48+
- Per-leg fills assumed independent within a day.
49+
- Maker fee = 0 ignores `rebateRate` (20-25% of pool taker fees redistributed to makers). Real maker income could be modestly HIGHER. Conservative direction.
50+
- 14 days is a short window; the in-sample / OOS split is *one* random partition, not k-fold. Repeat with different splits to test stability.
51+
- Today's bestAsk/bestBid used to compute maker target — historical spread may have differed.
52+
53+
---
54+
*Snapshot: 2026-05-15T07:25:38.929870+00:00*
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Maker Simulation v4 — Corrected fee + train/test split (2026-05-15T07:24:50.874158+00:00)
2+
3+
**Method**: v3 plumbing + two fixes:
4+
(A) maker fee mode = `zero` (v3 used taker_rate, which was wrong; Polymarket docs: "makers never pay fees")
5+
(B) 10-day in-sample / 4-day OOS split. Top 6 groups picked by IN-SAMPLE daily $; their OOS sum reported separately.
6+
7+
**Window**: 14 days (2026-05-01 -> 2026-05-15)
8+
**Basket size cap**: $100
9+
**Trades fetched**: 7232 raw -> 1510 qualifying
10+
**Days with trade activity**: in-sample 11, OOS 4
11+
**takerOnly distribution across our markets**: {True: 7}
12+
13+
## Headline (with maker fee = 0)
14+
15+
| Verdict | Daily $ | Annualized |
16+
|---|---:|---:|
17+
| Naive (all 3 groups), in-sample | $-0.05 | $-20 |
18+
| Naive (all 3 groups), OOS | $-0.15 | $-55 |
19+
| Whole window (no split) | $-0.08 | $-29 |
20+
| **Top 6 by in-sample, in-sample** | $-0.05 | $-20 |
21+
| **Top 6 by in-sample, OOS** ← honest verdict | $-0.15 | $-55 |
22+
23+
If top-N OOS << top-N in-sample, the top-N looks like overfitting.
24+
OOS / in-sample ratio for top-6: 2.75
25+
26+
## Top 6 groups — in-sample picked, OOS measured
27+
28+
| Rank | Group | Q | Best markup | IS daily $ | OOS daily $ | OOS/IS |
29+
|---:|---|---|---:|---:|---:|---:|
30+
| 1 | `0xd898209c4efc...` | Will Jeff Merkley be the Democ vs Will Jac | $0.005 | $+0.000 | $+0.000 | +0.00 |
31+
| 2 | `0xb37eb81b8e7c...` | Will Chelsea win the 2025-2026 vs Will Man | $0.005 | $+0.000 | $+0.000 | +0.00 |
32+
| 3 | `0x7e28615c2891...` | Will Arsenal win the 2025–26 E vs Will Man | $0.010 | $-0.055 | $-0.150 | +2.75 |
33+
34+
## Compared to prior versions
35+
36+
| Version | Method | Annualized | Issue |
37+
|---|---|---:|---|
38+
| v1 (mid-touch) | mid touch as fill proxy | $15,546 | mid touching != fill |
39+
| v2 size-uncapped | sum of all SELL Yes | $918 | income computed at $100/fill regardless of trade size |
40+
| v3 size-capped, taker fee | size cap added | -$263 naive / +$117 cherry-pick | maker fee wrongly = taker fee; no OOS check |
41+
| **v4 this run** | size cap + maker_fee=zero + IS/OOS | $-55 OOS naive / $-55 OOS top-6 | fee per docs; cherry-pick now measured out-of-sample |
42+
43+
## Caveats (still standing)
44+
45+
- Queue priority: assumes we are first in line at our maker price level.
46+
- Per-leg fills assumed independent within a day.
47+
- Maker fee = 0 ignores `rebateRate` (20-25% of pool taker fees redistributed to makers). Real maker income could be modestly HIGHER. Conservative direction.
48+
- 14 days is a short window; the in-sample / OOS split is *one* random partition, not k-fold. Repeat with different splits to test stability.
49+
- Today's bestAsk/bestBid used to compute maker target — historical spread may have differed.
50+
51+
---
52+
*Snapshot: 2026-05-15T07:24:50.874158+00:00*
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# Maker Simulation v4 — Corrected fee + train/test split (2026-05-15T07:19:16.547712+00:00)
2+
3+
**Method**: v3 plumbing + two fixes:
4+
(A) maker fee mode = `zero` (v3 used taker_rate, which was wrong; Polymarket docs: "makers never pay fees")
5+
(B) 10-day in-sample / 4-day OOS split. Top 18 groups picked by IN-SAMPLE daily $; their OOS sum reported separately.
6+
7+
**Window**: 14 days (2026-05-01 -> 2026-05-15)
8+
**Basket size cap**: $100
9+
**Trades fetched**: 1022715 raw -> 104389 qualifying
10+
**Days with trade activity**: in-sample 11, OOS 4
11+
**takerOnly distribution across our markets**: {True: 738}
12+
13+
## Headline (with maker fee = 0)
14+
15+
| Verdict | Daily $ | Annualized |
16+
|---|---:|---:|
17+
| Naive (all 50 groups), in-sample | $+0.00 | $+0 |
18+
| Naive (all 50 groups), OOS | $+0.00 | $+0 |
19+
| Whole window (no split) | $+0.00 | $+0 |
20+
| **Top 18 by in-sample, in-sample** | $+0.00 | $+0 |
21+
| **Top 18 by in-sample, OOS** ← honest verdict | $+0.00 | $+0 |
22+
23+
If top-N OOS << top-N in-sample, the top-N looks like overfitting.
24+
OOS / in-sample ratio for top-18: 0.00
25+
26+
## Top 18 groups — in-sample picked, OOS measured
27+
28+
| Rank | Group | Q | Best markup | IS daily $ | OOS daily $ | OOS/IS |
29+
|---:|---|---|---:|---:|---:|---:|
30+
| 1 | `0x7faa974ff857...` | Will the Carolina Hurricanes w vs Will the | $0.005 | $+0.000 | $+0.000 | +0.00 |
31+
| 2 | `0x11e9a09023ac...` | Will the Oklahoma City Thunder vs Will the | $0.005 | $+0.000 | $+0.000 | +0.00 |
32+
| 3 | `0xb5c32a9acd39...` | Will Spain win the 2026 FIFA W vs Will Eng | $0.005 | $+0.000 | $+0.000 | +0.00 |
33+
| 4 | `0x2c3d7e0eee6f...` | Will Gavin Newsom win the 2028 vs Will Ale | $0.005 | $+0.000 | $+0.000 | +0.00 |
34+
| 5 | `0xb9aa4595bbe8...` | Will JD Vance win the 2028 US vs Will Gav | $0.005 | $+0.000 | $+0.000 | +0.00 |
35+
| 6 | `0xc7d902c4f18f...` | Will Donald Trump win the 2028 vs Will J.D | $0.005 | $+0.000 | $+0.000 | +0.00 |
36+
| 7 | `0x0aa99409c83e...` | Will Ken Paxton win the 2026 T vs Will Joh | $0.005 | $+0.000 | $+0.000 | +0.00 |
37+
| 8 | `0x7b95a46fc059...` | 2026 Balance of Power: D Senat vs 2026 Bal | $0.005 | $+0.000 | $+0.000 | +0.00 |
38+
| 9 | `0x51a3b9f29275...` | Will Nikola Jokic win the 2025 vs Will Sha | $0.005 | $+0.000 | $+0.000 | +0.00 |
39+
| 10 | `0x8a612f242b38...` | Will the Oklahoma City Thunder vs Will the | $0.005 | $+0.000 | $+0.000 | +0.00 |
40+
| 11 | `0xc8f80ae8e6e9...` | Will PSG win the 2025–26 Champ vs Will Ars | $0.005 | $+0.000 | $+0.000 | +0.00 |
41+
| 12 | `0x9250593bd8a2...` | Will Vicky Dávila win the 1st vs Will Lui | $0.005 | $+0.000 | $+0.000 | +0.00 |
42+
| 13 | `0x3e140cadcdda...` | Will Vicky Dávila win the 2026 vs Will Lui | $0.005 | $+0.000 | $+0.000 | +0.00 |
43+
| 14 | `0xf5df3e53ea40...` | Will Mallory McMorrow win the vs Will Hal | $0.005 | $+0.000 | $+0.000 | +0.00 |
44+
| 15 | `0x966a3221e05d...` | Will Tarcisio de Freitas win t vs Will Lui | $0.005 | $+0.000 | $+0.000 | +0.00 |
45+
| 16 | `0xda59e733de4f...` | Will Kylian Mbappe be the 2025 vs Will Ous | $0.005 | $+0.000 | $+0.000 | +0.00 |
46+
| 17 | `0xcd778cf07b7b...` | Will Anthropic’s market cap be vs Will Ant | $0.005 | $+0.000 | $+0.000 | +0.00 |
47+
| 18 | `0x178e5d521fe4...` | Will Kylian Mbappé win the 202 vs Will Erl | $0.005 | $+0.000 | $+0.000 | +0.00 |
48+
49+
## Compared to prior versions
50+
51+
| Version | Method | Annualized | Issue |
52+
|---|---|---:|---|
53+
| v1 (mid-touch) | mid touch as fill proxy | $15,546 | mid touching != fill |
54+
| v2 size-uncapped | sum of all SELL Yes | $918 | income computed at $100/fill regardless of trade size |
55+
| v3 size-capped, taker fee | size cap added | -$263 naive / +$117 cherry-pick | maker fee wrongly = taker fee; no OOS check |
56+
| **v4 this run** | size cap + maker_fee=zero + IS/OOS | $+0 OOS naive / $+0 OOS top-18 | fee per docs; cherry-pick now measured out-of-sample |
57+
58+
## Caveats (still standing)
59+
60+
- Queue priority: assumes we are first in line at our maker price level.
61+
- Per-leg fills assumed independent within a day.
62+
- Maker fee = 0 ignores `rebateRate` (20-25% of pool taker fees redistributed to makers). Real maker income could be modestly HIGHER. Conservative direction.
63+
- 14 days is a short window; the in-sample / OOS split is *one* random partition, not k-fold. Repeat with different splits to test stability.
64+
- Today's bestAsk/bestBid used to compute maker target — historical spread may have differed.
65+
66+
---
67+
*Snapshot: 2026-05-15T07:19:16.547712+00:00*
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Maker Simulation v4 — Multi-Window Stability (2026-05-15T07:26:39.204878+00:00)
2+
3+
**Method**: ran v4 (maker_fee=zero, 10 IS / 4 OOS) on 4 non-overlapping 14-day windows. Total span: 2026-03-20 -> 2026-05-15 (56 days).
4+
5+
**Why this matters**: a single 14-day window's verdict can be window-luck. If the thesis is real, all 4 windows should give roughly consistent signs and magnitudes. If they bounce sign or order of magnitude, the single-window verdict was a coincidence.
6+
7+
## Per-window numbers
8+
9+
| Window | IS days | OOS days | Naive IS / yr | Naive OOS / yr | Cherry IS / yr | Cherry OOS / yr |
10+
|---|---:|---:|---:|---:|---:|---:|
11+
| 2026-03-20 → 2026-04-03 | 10 | 4 | $-1 | $+0 | $-1 | $+0 |
12+
| 2026-04-03 → 2026-04-17 | 10 | 4 | $-15 | $-32 | $-15 | $-32 |
13+
| 2026-04-17 → 2026-05-01 | 10 | 4 | $-12 | $+0 | $-12 | $+0 |
14+
| 2026-05-01 → 2026-05-15 | 10 | 4 | $-20 | $-55 | $-20 | $-55 |
15+
16+
## Cross-window stability
17+
18+
| Metric | mean | median | min | max | SD | SD/mean |
19+
|---|---:|---:|---:|---:|---:|---:|
20+
| Naive IS / yr | $-12 | $-14 | $-20 | $-1 | $8 | 0.69 |
21+
| Naive OOS / yr | $-22 | $-16 | $-55 | $+0 | $27 | 1.23 |
22+
| Cherry IS / yr | $-12 | $-14 | $-20 | $-1 | $8 | 0.69 |
23+
| Cherry OOS / yr | $-22 | $-16 | $-55 | $+0 | $27 | 1.23 |
24+
25+
**Read this**: if SD/mean > 1.0, your point estimate is mostly noise. If sign of OOS is consistent across windows but magnitude varies 2-3x, you have a real but noisy signal.
26+
27+
## Persistent winners (positive OOS in ≥3 of 4 windows)
28+
29+
Found 0 groups (of 3 present in all windows). Sum of their mean OOS = **$+0/yr**.
30+
31+
| Rank | Group | Q | +OOS windows | Mean OOS / yr | Median OOS / yr | Values per window |
32+
|---:|---|---|:-:|---:|---:|---|
33+
34+
## Interpretation
35+
36+
- The single-window verdict from any one run alone is statistically weak.
37+
- The honest verdict is the mean OOS across all windows.
38+
- The set of **persistent winners** (+OOS in ≥3/4 windows) gives the most defensible cherry-pick.
39+
- If `naive_oos` flips sign across windows, the thesis applies only to a subset of groups, not to a naive deploy.
40+
41+
---
42+
*Snapshot: 2026-05-15T07:26:39.204878+00:00*

reports/maker-simulation-v4-multi-window-2026-05-15.md renamed to reports/maker-simulation-v4-multi-window-2026-05-15-longtail-dvr.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Maker Simulation v4 — Multi-Window Stability (2026-05-15T04:05:23.269350+00:00)
1+
# Maker Simulation v4 — Multi-Window Stability (2026-05-15T07:26:38.997558+00:00)
22

33
**Method**: ran v4 (maker_fee=zero, 10 IS / 4 OOS) on 4 non-overlapping 14-day windows. Total span: 2026-03-20 -> 2026-05-15 (56 days).
44

@@ -39,4 +39,4 @@ Found 0 groups (of 64 present in all windows). Sum of their mean OOS = **$+0/yr*
3939
- If `naive_oos` flips sign across windows, the thesis applies only to a subset of groups, not to a naive deploy.
4040

4141
---
42-
*Snapshot: 2026-05-15T04:05:23.269350+00:00*
42+
*Snapshot: 2026-05-15T07:26:38.997558+00:00*

0 commit comments

Comments
 (0)