Multi-window post-mortem: maker fee corrected, thesis still dies on 56-day stability test#10
Conversation
Post-mortem of the maker-arb thesis after WW archived PR WW-shan#9. Two methodology fixes in v4: (A) maker fee was wrongly = taker fee in v3. Polymarket docs and live feeSchedule.takerOnly=True on 100/100 sampled markets confirm makers never pay fees. Corrected: maker_fee = 0. (B) v3 was 100% in-sample. Added 10/4 train/test split + multi-window orchestration to detect window-luck. Findings progression: v3 (in-sample, taker fee): -$263/yr naive, +$117 cherry v4 single window (today): +$195/yr naive, +$289 cherry OOS v4 multi-window (4 x 14d = 56d): naive mean -$183 (sign flips!), cherry mean +$251 but UNSTABLE The decisive result: across 4 non-overlapping 14-day windows covering 2026-03-20 to 2026-05-15: - 0 of 64 groups have positive OOS in >=3/4 windows - 44/64 groups (69%) had zero positive OOS across all 4 windows - Even the 2 groups consistently in top-18 by in-sample (Wisconsin, Kansas) had positive OOS in only 2/4 and 1/4 windows respectively - Naive deploy sign flips: -$1,117 in 3/20-4/03 window, +$239 in 4/03-4/17 window Cherry-pick "wins" within each window because we pick this window's winners; but the winners rotate, so no actionable alpha. Files: scripts/simulate_maker_basket_v4.py - corrected fee + IS/OOS split + --end-date for time-shifting scripts/aggregate_v4_multi_window.py - cross-window stability reports/maker-simulation-v4-*-w-*.md - 4 per-window reports reports/maker-simulation-v4-multi-window-2026-05-15.md - the verdict Note: poly_strategy/maker.py production code already has fee_rate_assumption=0.0 for maker legs. The fee bug was localized to my standalone research script, not production. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per user's "测高交易量做市" follow-up: extended v4 + multi-window to test
maker basket arb on high-volume cohorts, not just long-tail D-vs-R.
Three cohort tests now done, all with maker_fee=0 + 10/4 IS/OOS split
on 4 non-overlapping 14-day windows:
Cohort n groups Naive OOS mean Persistent winners
--------------------------- -------- -------------- -----------------
Long-tail D-vs-R 71 -$183 0 of 64
High-vol multi-member 50 STRUCTURAL FAIL -
High-vol binary (size=2) 6 -$22 0 of 3
The high-vol multi-member result is the most informative *new* finding:
96% of (group, markup) combinations got skipped via
"spread_too_narrow_for_maker". Per-leg spreads on high-vol multi-member
markets are 0.001-0.008 typically; long-shot legs are 1-tick wide
(spread=0.001). The "all legs filled as basket" strategy is
structurally impossible — you cannot place a valid maker quote on the
long-shot legs, and any narrow leg kills the whole basket.
Translation: high-vol markets are dominated by HFT market makers
leaving no room for slower-finger maker arb on the basket.
High-vol binary (size=2) escapes the structural problem but produces
all-negative or zero OOS across 4 windows. Only 6 such groups exist
today at vol>=$5k (mostly D/R + a few sports), and 3 of them rotate
between windows.
Files:
scripts/build_negrisk_cohort.py - generic cohort builder
with fixed Gamma pagination
(offset += 100 not 500)
scripts/simulate_maker_basket_v4.py - now accepts --cohort-file
and --cohort-tier
scripts/aggregate_v4_multi_window.py - now accepts --report-tag
so we can aggregate
multiple cohorts cleanly
reports/maker-simulation-v4-*-binhv-w-*.md - 4 binary high-vol windows
reports/maker-simulation-v4-*-highvol-w-today.md - structural-fail report
reports/maker-simulation-v4-multi-window-*-binary-highvol.md - binary HV agg
reports/maker-simulation-v4-multi-window-*-longtail-dvr.md - renamed
from the
untagged
version
Verdict: maker basket arb thesis is dead in all 3 testable cohorts.
The fee correction was a real bug (and could matter in other contexts),
but did not save the thesis under rigorous multi-window testing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Follow-up commit: tested two additional cohorts to address "is this thesis dead only on long-tail, or universally?" Results across all 3 cohorts now done with the same v4 + 4-window methodology:
The high-vol multi-member result is the most informative new finding: 96% of (group, markup) combinations got skipped via Translation: high-vol markets are dominated by HFT market makers leaving no room for basket-arb maker quotes. This pushes the verdict from "dead in long-tail D/R" to "dead across all 3 testable cohorts". I'm now declaring this case definitively closed unless you spot something. New files in this push:
|
Spent ~1 hour probing Kalshi API + cross-platform matching feasibility.
Found enough open questions that a real thesis test needs more time:
1. Kalshi public API accessible from China (no auth needed for reads).
2. /markets?status=open is dominated by 3000+ multivariate parlay
tickers (KXMV*) with zero liquidity. Need series_ticker queries
instead.
3. 17 known-active series produce 520 single-event markets; 300 with
non-empty orderbook; 27 with quotes on both YES and NO sides.
4. CRITICAL: orderbook_fp.yes_dollars and no_dollars are BIDS, not
asks. Confirmed via KXFED-27APR-T4.25 where yes_dollars max=$0.26
and no_dollars max=$0.57 (their sum < $1 = arb if they were asks).
Synthetic asks: yes_ask = 1 - best_no_bid; no_ask = 1 - best_yes_bid.
5. Listing endpoint yes_ask field is always None; quotes only in the
per-ticker orderbook endpoint (so N calls per scan).
The real bottleneck for thesis testing isn't data access, it's
event matching: Kalshi has hourly BTC ladders + rate-level binaries,
Polymarket has date-bounded thresholds + cut/hike decision binaries.
The "same event, different binarization" problem dominates.
Three paths discussed in the report — Path A (use WW's existing
cross_platform.py pipeline with jaccard + LLM), Path B (hand-curate
~10 pairs), or Path C (park the thesis). Awaiting user direction.
Files:
scripts/probe_cross_platform.py - direct probe with manual pair
list (placeholder empty for now)
reports/cross-platform-feasibility-2026-05-15.md - audit findings +
interpretation fix
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@WW-shan 谢谢 review — 你的 fix 我已经 rebase 进来了。 你 catch 的几个 issue 我同意
你提的 builder fee 我打算严肃看"Builder fees are not modeled. The maker-fee-zero assumption is for direct Polymarket platform fees; orders routed through a builder with 这个我之前没考虑过。如果我们事实上是通过 builder 路由的,builder_maker_fee_bps 不为 0,v4 那个 +$195/yr naive 翻 +$307/yr cherry-pick OOS 的数字就要重算。可能 thesis 又翻回负。 我下一步去查:
另外,我又测了 Path A(跨平台 Polymarket-Kalshi)PR 里的最后两个 commit。结论:
跨平台 thesis 也死。 完整 writeup 在 我接下来做
|
|
Builder fee 调查 done。结论:你的 caveat 是 defensive,不 flip 现在的 verdict。 具体 3 步:
所以 v4 verdict 全部站得住:
全写在 Caveats 还活着但量级小:
Case 我这边算关了。 Maker basket arb 在我们能测的 4 个 cohort 上全死。要不要把 PR #10 也按 archive 模式 merge(跟 PR #9 一样),还是 close? |
|
注意到你已经 merge 了这个 PR。Case closed。 3 天 + 4 个 cohort 测下来,maker basket arb thesis 在我们能测的所有方向上都死了。Fee bug 是真 bug(值得修),但单纯修 fee 救不回 thesis;多窗口、多 cohort、跨平台都把它打回原形。 非常感谢你的 review——你 catch 的几个 bug(window header、Kalshi synthetic ask、builder fee caveat、UTC midnight)都是真问题。 工具留在 repo 里(v4、多窗口聚合器、cohort builder、cross-platform pipeline run、targeted Kalshi puller),未来如果有新数据或新 thesis 可以复用。 Maker arb 这个方向到此为止,研究 close。 |
@WW-shan This is a post-mortem on top of PR #9 (which you correctly archived). I want a second pair of eyes before declaring final case-closed.
TL;DR
Your archive of PR #9 was correct on outcome. But on the way to confirming it, I found:
feeSchedule.takerOnly: Trueon 100/100 sampled markets (incl. all 6 of our actual D/R cohort).naive_oosflips sign across windows (-$1,117 to +$239), and 0 of 64 groups have positive OOS in ≥3/4 windows.So: fee correction was a real bug, single-window v4 was a head fake, multi-window says case closed.
What changed vs your archive of PR #9
The decisive number
64 D/R groups present in all 4 windows. Distribution of (# windows where OOS daily $ > 0):
| 0/4 windows positive | 44 groups (69%) |
| 1/4 | 15 groups |
| 2/4 | 5 groups |
| 3/4 | 0 |
| 4/4 | 0 |
Even the 2 groups consistently in top-18 by in-sample fill (Wisconsin, Kansas D/R) had positive OOS in 2/4 and 1/4 windows respectively. There is no actionable subset.
Why each window's cherry-pick still looks positive
Within each window, top-18 by IS daily $ → that window's OOS sum is positive (+$147 to +$482). But the top-18 rotate — only 2 groups appear in top-18 across all 4 windows. The cherry-pick "wins" each window because it picks that window's winners; we cannot pre-identify them.
What I'd like you to scrutinize
feeSchedule.takerOnlyis unambiguous on the page, but I haven't placed a real test order. Are you aware of any case where makers do pay fees on Polymarket CLOB?poly_strategy/maker.pyalready sets"fee_rate_assumption": 0.0for maker legs at line 1625. Production was always correct; only my standalone research sim had the bug. Are there other research scripts that might have the same issue?Files
scripts/simulate_maker_basket_v4.py—--maker-fee-mode {zero,taker_rate,custom},--end-date YYYY-MM-DD,--in-sample-days N,--window-tag Xscripts/aggregate_v4_multi_window.py— reads N v4 JSONs, outputs stability + persistent-winner analysisreports/maker-simulation-v4-2026-05-15-w-{today,14d-ago,28d-ago,42d-ago}.md— per-window reportsreports/maker-simulation-v4-multi-window-2026-05-15.md— the verdictMy ask
If you agree the methodology is right, please confirm the verdict. If you spot a flaw, please call it out — I've been wrong three times today and would rather be wrong a fourth time now than later.
If you're heads-down on the LLM research profile work and don't have time to review this, just say so and I'll proceed.