Multi-window post-mortem: maker fee corrected, thesis still dies on 56-day stability test by Soli22de · Pull Request #10 · WW-shan/poly_strategy

Soli22de · 2026-05-15T06:36:27Z

@WW-shan This is a post-mortem on top of PR #9 (which you correctly archived). I want a second pair of eyes before declaring final case-closed.

TL;DR

Your archive of PR #9 was correct on outcome. But on the way to confirming it, I found:

A real methodology bug in v3 — maker fee was wrongly set to taker fee. Polymarket docs say "makers never pay fees" and feeSchedule.takerOnly: True on 100/100 sampled markets (incl. all 6 of our actual D/R cohort).
Fixing that bug alone flipped a single-window verdict to +$195/yr naive, +$289 cherry-pick OOS.
But running 4 non-overlapping 14-day windows over 56 days kills the thesis again: naive_oos flips sign across windows (-$1,117 to +$239), and 0 of 64 groups have positive OOS in ≥3/4 windows.

So: fee correction was a real bug, single-window v4 was a head fake, multi-window says case closed.

What changed vs your archive of PR #9

Layer	v3 (your archive)	v4 single-window	v4 multi-window (4 × 14d)
Naive annualized	-$263	+$195	mean -$183 (range -$1117 to +$239)
Cherry-pick OOS	n/a (no OOS)	+$289	mean +$251 (range +$147 to +$482)
Verdict	dead	seemingly alive	dead — no persistent edge

The decisive number

64 D/R groups present in all 4 windows. Distribution of (# windows where OOS daily $ > 0):

| 0/4 windows positive | 44 groups (69%) |
| 1/4 | 15 groups |
| 2/4 | 5 groups |
| 3/4 | 0 |
| 4/4 | 0 |

Even the 2 groups consistently in top-18 by in-sample fill (Wisconsin, Kansas D/R) had positive OOS in 2/4 and 1/4 windows respectively. There is no actionable subset.

Why each window's cherry-pick still looks positive

Within each window, top-18 by IS daily $ → that window's OOS sum is positive (+$147 to +$482). But the top-18 rotate — only 2 groups appear in top-18 across all 4 windows. The cherry-pick "wins" each window because it picks that window's winners; we cannot pre-identify them.

What I'd like you to scrutinize

Is the fee correction actually right? Polymarket docs + feeSchedule.takerOnly is unambiguous on the page, but I haven't placed a real test order. Are you aware of any case where makers do pay fees on Polymarket CLOB?
Multi-window methodology: 4 non-overlapping windows of 14 days each, 10 IS / 4 OOS per window. Is there a flaw here? Specifically:
- Should I do rolling-window CV instead of non-overlapping?
- Should I extend to 90+ days?
- Should I bootstrap-sample groups instead of picking top-18?
The 42d-ago window's -$1,117 OOS is dominated by a few catastrophic losses. Did I make an error in those, or is it real long-tail tail risk?
My research script vs production: poly_strategy/maker.py already sets "fee_rate_assumption": 0.0 for maker legs at line 1625. Production was always correct; only my standalone research sim had the bug. Are there other research scripts that might have the same issue?

Files

scripts/simulate_maker_basket_v4.py — --maker-fee-mode {zero,taker_rate,custom}, --end-date YYYY-MM-DD, --in-sample-days N, --window-tag X
scripts/aggregate_v4_multi_window.py — reads N v4 JSONs, outputs stability + persistent-winner analysis
reports/maker-simulation-v4-2026-05-15-w-{today,14d-ago,28d-ago,42d-ago}.md — per-window reports
reports/maker-simulation-v4-multi-window-2026-05-15.md — the verdict

My ask

If you agree the methodology is right, please confirm the verdict. If you spot a flaw, please call it out — I've been wrong three times today and would rather be wrong a fourth time now than later.

If you're heads-down on the LLM research profile work and don't have time to review this, just say so and I'll proceed.

Post-mortem of the maker-arb thesis after WW archived PR WW-shan#9. Two methodology fixes in v4: (A) maker fee was wrongly = taker fee in v3. Polymarket docs and live feeSchedule.takerOnly=True on 100/100 sampled markets confirm makers never pay fees. Corrected: maker_fee = 0. (B) v3 was 100% in-sample. Added 10/4 train/test split + multi-window orchestration to detect window-luck. Findings progression: v3 (in-sample, taker fee): -$263/yr naive, +$117 cherry v4 single window (today): +$195/yr naive, +$289 cherry OOS v4 multi-window (4 x 14d = 56d): naive mean -$183 (sign flips!), cherry mean +$251 but UNSTABLE The decisive result: across 4 non-overlapping 14-day windows covering 2026-03-20 to 2026-05-15: - 0 of 64 groups have positive OOS in >=3/4 windows - 44/64 groups (69%) had zero positive OOS across all 4 windows - Even the 2 groups consistently in top-18 by in-sample (Wisconsin, Kansas) had positive OOS in only 2/4 and 1/4 windows respectively - Naive deploy sign flips: -$1,117 in 3/20-4/03 window, +$239 in 4/03-4/17 window Cherry-pick "wins" within each window because we pick this window's winners; but the winners rotate, so no actionable alpha. Files: scripts/simulate_maker_basket_v4.py - corrected fee + IS/OOS split + --end-date for time-shifting scripts/aggregate_v4_multi_window.py - cross-window stability reports/maker-simulation-v4-*-w-*.md - 4 per-window reports reports/maker-simulation-v4-multi-window-2026-05-15.md - the verdict Note: poly_strategy/maker.py production code already has fee_rate_assumption=0.0 for maker legs. The fee bug was localized to my standalone research script, not production. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per user's "测高交易量做市" follow-up: extended v4 + multi-window to test maker basket arb on high-volume cohorts, not just long-tail D-vs-R. Three cohort tests now done, all with maker_fee=0 + 10/4 IS/OOS split on 4 non-overlapping 14-day windows: Cohort n groups Naive OOS mean Persistent winners --------------------------- -------- -------------- ----------------- Long-tail D-vs-R 71 -$183 0 of 64 High-vol multi-member 50 STRUCTURAL FAIL - High-vol binary (size=2) 6 -$22 0 of 3 The high-vol multi-member result is the most informative *new* finding: 96% of (group, markup) combinations got skipped via "spread_too_narrow_for_maker". Per-leg spreads on high-vol multi-member markets are 0.001-0.008 typically; long-shot legs are 1-tick wide (spread=0.001). The "all legs filled as basket" strategy is structurally impossible — you cannot place a valid maker quote on the long-shot legs, and any narrow leg kills the whole basket. Translation: high-vol markets are dominated by HFT market makers leaving no room for slower-finger maker arb on the basket. High-vol binary (size=2) escapes the structural problem but produces all-negative or zero OOS across 4 windows. Only 6 such groups exist today at vol>=$5k (mostly D/R + a few sports), and 3 of them rotate between windows. Files: scripts/build_negrisk_cohort.py - generic cohort builder with fixed Gamma pagination (offset += 100 not 500) scripts/simulate_maker_basket_v4.py - now accepts --cohort-file and --cohort-tier scripts/aggregate_v4_multi_window.py - now accepts --report-tag so we can aggregate multiple cohorts cleanly reports/maker-simulation-v4-*-binhv-w-*.md - 4 binary high-vol windows reports/maker-simulation-v4-*-highvol-w-today.md - structural-fail report reports/maker-simulation-v4-multi-window-*-binary-highvol.md - binary HV agg reports/maker-simulation-v4-multi-window-*-longtail-dvr.md - renamed from the untagged version Verdict: maker basket arb thesis is dead in all 3 testable cohorts. The fee correction was a real bug (and could matter in other contexts), but did not save the thesis under rigorous multi-window testing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Soli22de · 2026-05-15T07:28:38Z

Follow-up commit: tested two additional cohorts to address "is this thesis dead only on long-tail, or universally?"

Results across all 3 cohorts now done with the same v4 + 4-window methodology:

Cohort	n groups	Naive OOS mean	Persistent winners
Long-tail D-vs-R (size=2, vol<$1k)	71	-$183	0 of 64
High-vol multi-member (vol≥$10k, size≥3)	50	STRUCTURAL FAIL	n/a
High-vol binary (vol≥$5k, size=2)	6	-$22	0 of 3

The high-vol multi-member result is the most informative new finding: 96% of (group, markup) combinations got skipped via spread_too_narrow_for_maker. Per-leg spreads on high-vol multi-member markets are 0.001-0.008 typically; long-shot legs are 1-tick wide. Any narrow leg kills the whole basket, and you can't place a valid maker quote on long-shot legs (price 0.001-0.005 with no maker zone).

Translation: high-vol markets are dominated by HFT market makers leaving no room for basket-arb maker quotes.

This pushes the verdict from "dead in long-tail D/R" to "dead across all 3 testable cohorts". I'm now declaring this case definitively closed unless you spot something.

New files in this push:

scripts/build_negrisk_cohort.py - generic cohort builder (fixes a Gamma pagination bug — offset should be += 100 not += 500 since the API caps each page at 100 regardless of limit)
scripts/simulate_maker_basket_v4.py - now takes --cohort-file and --cohort-tier
scripts/aggregate_v4_multi_window.py - now takes --report-tag
5 new reports (3 windows of binary HV, 1 highvol-mm structural fail, 1 binary-HV multi-window agg)

Spent ~1 hour probing Kalshi API + cross-platform matching feasibility. Found enough open questions that a real thesis test needs more time: 1. Kalshi public API accessible from China (no auth needed for reads). 2. /markets?status=open is dominated by 3000+ multivariate parlay tickers (KXMV*) with zero liquidity. Need series_ticker queries instead. 3. 17 known-active series produce 520 single-event markets; 300 with non-empty orderbook; 27 with quotes on both YES and NO sides. 4. CRITICAL: orderbook_fp.yes_dollars and no_dollars are BIDS, not asks. Confirmed via KXFED-27APR-T4.25 where yes_dollars max=$0.26 and no_dollars max=$0.57 (their sum < $1 = arb if they were asks). Synthetic asks: yes_ask = 1 - best_no_bid; no_ask = 1 - best_yes_bid. 5. Listing endpoint yes_ask field is always None; quotes only in the per-ticker orderbook endpoint (so N calls per scan). The real bottleneck for thesis testing isn't data access, it's event matching: Kalshi has hourly BTC ladders + rate-level binaries, Polymarket has date-bounded thresholds + cut/hike decision binaries. The "same event, different binarization" problem dominates. Three paths discussed in the report — Path A (use WW's existing cross_platform.py pipeline with jaccard + LLM), Path B (hand-curate ~10 pairs), or Path C (park the thesis). Awaiting user direction. Files: scripts/probe_cross_platform.py - direct probe with manual pair list (placeholder empty for now) reports/cross-platform-feasibility-2026-05-15.md - audit findings + interpretation fix Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Soli22de · 2026-05-15T08:29:21Z

@WW-shan 谢谢 review — 你的 fix 我已经 rebase 进来了。

你 catch 的几个 issue 我同意

v4 历史窗口 header bug：用 --end-date 2026-04-17 跑出来 header 写 -> 2026-05-15（我用 now 而不是 window_end_dt）。你加的 window_date_label() 和 default_window_end() 把 UTC midnight 也对齐了，比原来更干净。
probe_cross_platform.py 的 ask 计算错了：我把 Kalshi yes_dollars/no_dollars 当 asks 读，实际是 bids。要 1 - max(opposite_side_bids) 才是 synthetic ask。你直接改对了。
build_negrisk_cohort.py 的 --require-fees-enabled：store_true + default=True 实际等同于 hardcode True，没法关。改 BooleanOptionalAction 是对的。

你提的 builder fee 我打算严肃看

"Builder fees are not modeled. The maker-fee-zero assumption is for direct Polymarket platform fees; orders routed through a builder with builder_maker_fee_bps could pay a separate builder fee."

这个我之前没考虑过。如果我们事实上是通过 builder 路由的，builder_maker_fee_bps 不为 0，v4 那个 +$195/yr naive 翻 +$307/yr cherry-pick OOS 的数字就要重算。可能 thesis 又翻回负。

我下一步去查：

现实 D/R market 上 builder_maker_fee_bps 是 0 还是非 0
如果非 0，多大
重跑 v4 看 verdict 怎么变

另外，我又测了 Path A（跨平台 Polymarket-Kalshi）

PR 里的最后两个 commit。结论：

走完你的 6-step pipeline
1000 candidate pairs 里 0 个过 option_match=True + edge>=0
10 个真正同事件 NHL Stanley Cup pair 全部 negative edge（-2.77% 到 -97%）
"正 edge" 全是 question 结构错配（Polymarket 全年通胀 vs Kalshi 单月 CPI；Polymarket Austria vs Kalshi -AUS = Australia 的 IOC 代码）
你的 deterministic _option_match 不用 LLM 就抓住了所有错配

跨平台 thesis 也死。

完整 writeup 在 reports/cross-platform-pipeline-verdict-2026-05-15.md。

我接下来做

查 builder fee 真值（半小时）
如果 builder fee 非 0，重跑 v4 4 窗口
然后真的 case closed

Soli22de · 2026-05-15T08:32:49Z

Builder fee 调查 done。结论：你的 caveat 是 defensive，不 flip 现在的 verdict。

具体 3 步：

builder_maker_fee_bps 字段在 Gamma payload 里根本不存在。 抽样 8 个 D/R market（562793/4 等），每个 88 个字段，0 个含 "builder"。唯一非零 maker 相关字段是 makerBaseFee: 1000。
makerBaseFee 在官方 docs 里没出现。 docs.polymarket.com/trading/fees 明确写 "Makers are never charged fees. Only takers pay fees." 链上查询用 getClobMarketInfo(conditionID)，返回 fee data = {r: feeRate, e: exponent, to: takerOnly}——就是 feeSchedule 那三个字段。makerBaseFee: 1000 应该是 legacy 字段，不在 active fee 模型里。
Production poly_strategy/maker.py:1625 设置 "fee_rate_assumption": 0.0。 Production code + docs + on-chain 三者一致。

所以 v4 verdict 全部站得住：

长尾 D/R: −$183/yr naive，0/64 持续赢家
高vol multi-member: 96% structural skip
高vol binary: −$22/yr naive，0/3 持续赢家
跨平台 Polymarket-Kalshi: 0 候选过 option_match + edge≥0

全写在 reports/builder-fee-investigation-2026-05-15.md。

Caveats 还活着但量级小：

Polygon gas per order (~$0.001-0.01)
USDC bridge 一次性成本
未来如果 Polymarket 加 builder fee — 你那 caveat 留着挺好

Case 我这边算关了。 Maker basket arb 在我们能测的 4 个 cohort 上全死。要不要把 PR #10 也按 archive 模式 merge（跟 PR #9 一样），还是 close？

Soli22de · 2026-05-15T08:44:46Z

注意到你已经 merge 了这个 PR。Case closed。

3 天 + 4 个 cohort 测下来，maker basket arb thesis 在我们能测的所有方向上都死了。Fee bug 是真 bug（值得修），但单纯修 fee 救不回 thesis；多窗口、多 cohort、跨平台都把它打回原形。

非常感谢你的 review——你 catch 的几个 bug（window header、Kalshi synthetic ask、builder fee caveat、UTC midnight）都是真问题。

工具留在 repo 里（v4、多窗口聚合器、cohort builder、cross-platform pipeline run、targeted Kalshi puller），未来如果有新数据或新 thesis 可以复用。

Maker arb 这个方向到此为止，研究 close。

张靖恒 and others added 2 commits May 15, 2026 14:35

张靖恒 and others added 2 commits May 15, 2026 15:46

Fix maker v4 report windows and fee caveats

3771e1e

WW-shan merged commit 6a2ebff into WW-shan:main May 15, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-window post-mortem: maker fee corrected, thesis still dies on 56-day stability test#10

Multi-window post-mortem: maker fee corrected, thesis still dies on 56-day stability test#10
WW-shan merged 4 commits into
WW-shan:mainfrom
Soli22de:research/2026-05-15-multiwindow-postmortem

Soli22de commented May 15, 2026

Uh oh!

Soli22de commented May 15, 2026

Uh oh!

Uh oh!

Soli22de commented May 15, 2026

Uh oh!

Soli22de commented May 15, 2026

Uh oh!

Soli22de commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Soli22de commented May 15, 2026

TL;DR

What changed vs your archive of PR #9

The decisive number

Why each window's cherry-pick still looks positive

What I'd like you to scrutinize

Files

My ask

Uh oh!

Soli22de commented May 15, 2026

Uh oh!

Uh oh!

Soli22de commented May 15, 2026

你 catch 的几个 issue 我同意

你提的 builder fee 我打算严肃看

另外，我又测了 Path A（跨平台 Polymarket-Kalshi）

我接下来做

Uh oh!

Soli22de commented May 15, 2026

Uh oh!

Soli22de commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants