Skip to content

Multi-window post-mortem: maker fee corrected, thesis still dies on 56-day stability test#10

Merged
WW-shan merged 4 commits into
WW-shan:mainfrom
Soli22de:research/2026-05-15-multiwindow-postmortem
May 15, 2026
Merged

Multi-window post-mortem: maker fee corrected, thesis still dies on 56-day stability test#10
WW-shan merged 4 commits into
WW-shan:mainfrom
Soli22de:research/2026-05-15-multiwindow-postmortem

Conversation

@Soli22de
Copy link
Copy Markdown
Collaborator

@WW-shan This is a post-mortem on top of PR #9 (which you correctly archived). I want a second pair of eyes before declaring final case-closed.

TL;DR

Your archive of PR #9 was correct on outcome. But on the way to confirming it, I found:

  1. A real methodology bug in v3 — maker fee was wrongly set to taker fee. Polymarket docs say "makers never pay fees" and feeSchedule.takerOnly: True on 100/100 sampled markets (incl. all 6 of our actual D/R cohort).
  2. Fixing that bug alone flipped a single-window verdict to +$195/yr naive, +$289 cherry-pick OOS.
  3. But running 4 non-overlapping 14-day windows over 56 days kills the thesis again: naive_oos flips sign across windows (-$1,117 to +$239), and 0 of 64 groups have positive OOS in ≥3/4 windows.

So: fee correction was a real bug, single-window v4 was a head fake, multi-window says case closed.

What changed vs your archive of PR #9

Layer v3 (your archive) v4 single-window v4 multi-window (4 × 14d)
Naive annualized -$263 +$195 mean -$183 (range -$1117 to +$239)
Cherry-pick OOS n/a (no OOS) +$289 mean +$251 (range +$147 to +$482)
Verdict dead seemingly alive dead — no persistent edge

The decisive number

64 D/R groups present in all 4 windows. Distribution of (# windows where OOS daily $ > 0):

| 0/4 windows positive | 44 groups (69%) |
| 1/4 | 15 groups |
| 2/4 | 5 groups |
| 3/4 | 0 |
| 4/4 | 0 |

Even the 2 groups consistently in top-18 by in-sample fill (Wisconsin, Kansas D/R) had positive OOS in 2/4 and 1/4 windows respectively. There is no actionable subset.

Why each window's cherry-pick still looks positive

Within each window, top-18 by IS daily $ → that window's OOS sum is positive (+$147 to +$482). But the top-18 rotate — only 2 groups appear in top-18 across all 4 windows. The cherry-pick "wins" each window because it picks that window's winners; we cannot pre-identify them.

What I'd like you to scrutinize

  1. Is the fee correction actually right? Polymarket docs + feeSchedule.takerOnly is unambiguous on the page, but I haven't placed a real test order. Are you aware of any case where makers do pay fees on Polymarket CLOB?
  2. Multi-window methodology: 4 non-overlapping windows of 14 days each, 10 IS / 4 OOS per window. Is there a flaw here? Specifically:
    • Should I do rolling-window CV instead of non-overlapping?
    • Should I extend to 90+ days?
    • Should I bootstrap-sample groups instead of picking top-18?
  3. The 42d-ago window's -$1,117 OOS is dominated by a few catastrophic losses. Did I make an error in those, or is it real long-tail tail risk?
  4. My research script vs production: poly_strategy/maker.py already sets "fee_rate_assumption": 0.0 for maker legs at line 1625. Production was always correct; only my standalone research sim had the bug. Are there other research scripts that might have the same issue?

Files

  • scripts/simulate_maker_basket_v4.py--maker-fee-mode {zero,taker_rate,custom}, --end-date YYYY-MM-DD, --in-sample-days N, --window-tag X
  • scripts/aggregate_v4_multi_window.py — reads N v4 JSONs, outputs stability + persistent-winner analysis
  • reports/maker-simulation-v4-2026-05-15-w-{today,14d-ago,28d-ago,42d-ago}.md — per-window reports
  • reports/maker-simulation-v4-multi-window-2026-05-15.md — the verdict

My ask

If you agree the methodology is right, please confirm the verdict. If you spot a flaw, please call it out — I've been wrong three times today and would rather be wrong a fourth time now than later.

If you're heads-down on the LLM research profile work and don't have time to review this, just say so and I'll proceed.

张靖恒 and others added 2 commits May 15, 2026 14:35
Post-mortem of the maker-arb thesis after WW archived PR WW-shan#9.

Two methodology fixes in v4:
  (A) maker fee was wrongly = taker fee in v3. Polymarket docs and
      live feeSchedule.takerOnly=True on 100/100 sampled markets
      confirm makers never pay fees. Corrected: maker_fee = 0.
  (B) v3 was 100% in-sample. Added 10/4 train/test split + multi-window
      orchestration to detect window-luck.

Findings progression:
  v3 (in-sample, taker fee):       -$263/yr naive, +$117 cherry
  v4 single window (today):        +$195/yr naive, +$289 cherry OOS
  v4 multi-window (4 x 14d = 56d): naive mean -$183 (sign flips!),
                                    cherry mean +$251 but UNSTABLE

The decisive result: across 4 non-overlapping 14-day windows covering
2026-03-20 to 2026-05-15:
  - 0 of 64 groups have positive OOS in >=3/4 windows
  - 44/64 groups (69%) had zero positive OOS across all 4 windows
  - Even the 2 groups consistently in top-18 by in-sample (Wisconsin,
    Kansas) had positive OOS in only 2/4 and 1/4 windows respectively
  - Naive deploy sign flips: -$1,117 in 3/20-4/03 window, +$239 in
    4/03-4/17 window

Cherry-pick "wins" within each window because we pick this window's
winners; but the winners rotate, so no actionable alpha.

Files:
  scripts/simulate_maker_basket_v4.py     - corrected fee + IS/OOS split
                                            + --end-date for time-shifting
  scripts/aggregate_v4_multi_window.py    - cross-window stability
  reports/maker-simulation-v4-*-w-*.md    - 4 per-window reports
  reports/maker-simulation-v4-multi-window-2026-05-15.md - the verdict

Note: poly_strategy/maker.py production code already has
fee_rate_assumption=0.0 for maker legs. The fee bug was localized
to my standalone research script, not production.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per user's "测高交易量做市" follow-up: extended v4 + multi-window to test
maker basket arb on high-volume cohorts, not just long-tail D-vs-R.

Three cohort tests now done, all with maker_fee=0 + 10/4 IS/OOS split
on 4 non-overlapping 14-day windows:

  Cohort                       n groups  Naive OOS mean  Persistent winners
  ---------------------------  --------  --------------  -----------------
  Long-tail D-vs-R             71        -$183           0 of 64
  High-vol multi-member        50        STRUCTURAL FAIL  -
  High-vol binary (size=2)     6         -$22            0 of 3

The high-vol multi-member result is the most informative *new* finding:
96% of (group, markup) combinations got skipped via
"spread_too_narrow_for_maker". Per-leg spreads on high-vol multi-member
markets are 0.001-0.008 typically; long-shot legs are 1-tick wide
(spread=0.001). The "all legs filled as basket" strategy is
structurally impossible — you cannot place a valid maker quote on the
long-shot legs, and any narrow leg kills the whole basket.

Translation: high-vol markets are dominated by HFT market makers
leaving no room for slower-finger maker arb on the basket.

High-vol binary (size=2) escapes the structural problem but produces
all-negative or zero OOS across 4 windows. Only 6 such groups exist
today at vol>=$5k (mostly D/R + a few sports), and 3 of them rotate
between windows.

Files:
  scripts/build_negrisk_cohort.py             - generic cohort builder
                                                with fixed Gamma pagination
                                                (offset += 100 not 500)
  scripts/simulate_maker_basket_v4.py         - now accepts --cohort-file
                                                and --cohort-tier
  scripts/aggregate_v4_multi_window.py        - now accepts --report-tag
                                                so we can aggregate
                                                multiple cohorts cleanly
  reports/maker-simulation-v4-*-binhv-w-*.md  - 4 binary high-vol windows
  reports/maker-simulation-v4-*-highvol-w-today.md - structural-fail report
  reports/maker-simulation-v4-multi-window-*-binary-highvol.md - binary HV agg
  reports/maker-simulation-v4-multi-window-*-longtail-dvr.md   - renamed
                                                                  from the
                                                                  untagged
                                                                  version

Verdict: maker basket arb thesis is dead in all 3 testable cohorts.
The fee correction was a real bug (and could matter in other contexts),
but did not save the thesis under rigorous multi-window testing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Soli22de
Copy link
Copy Markdown
Collaborator Author

Follow-up commit: tested two additional cohorts to address "is this thesis dead only on long-tail, or universally?"

Results across all 3 cohorts now done with the same v4 + 4-window methodology:

Cohort n groups Naive OOS mean Persistent winners
Long-tail D-vs-R (size=2, vol<$1k) 71 -$183 0 of 64
High-vol multi-member (vol≥$10k, size≥3) 50 STRUCTURAL FAIL n/a
High-vol binary (vol≥$5k, size=2) 6 -$22 0 of 3

The high-vol multi-member result is the most informative new finding: 96% of (group, markup) combinations got skipped via spread_too_narrow_for_maker. Per-leg spreads on high-vol multi-member markets are 0.001-0.008 typically; long-shot legs are 1-tick wide. Any narrow leg kills the whole basket, and you can't place a valid maker quote on long-shot legs (price 0.001-0.005 with no maker zone).

Translation: high-vol markets are dominated by HFT market makers leaving no room for basket-arb maker quotes.

This pushes the verdict from "dead in long-tail D/R" to "dead across all 3 testable cohorts". I'm now declaring this case definitively closed unless you spot something.

New files in this push:

  • scripts/build_negrisk_cohort.py - generic cohort builder (fixes a Gamma pagination bug — offset should be += 100 not += 500 since the API caps each page at 100 regardless of limit)
  • scripts/simulate_maker_basket_v4.py - now takes --cohort-file and --cohort-tier
  • scripts/aggregate_v4_multi_window.py - now takes --report-tag
  • 5 new reports (3 windows of binary HV, 1 highvol-mm structural fail, 1 binary-HV multi-window agg)

张靖恒 and others added 2 commits May 15, 2026 15:46
Spent ~1 hour probing Kalshi API + cross-platform matching feasibility.
Found enough open questions that a real thesis test needs more time:

1. Kalshi public API accessible from China (no auth needed for reads).
2. /markets?status=open is dominated by 3000+ multivariate parlay
   tickers (KXMV*) with zero liquidity. Need series_ticker queries
   instead.
3. 17 known-active series produce 520 single-event markets; 300 with
   non-empty orderbook; 27 with quotes on both YES and NO sides.
4. CRITICAL: orderbook_fp.yes_dollars and no_dollars are BIDS, not
   asks. Confirmed via KXFED-27APR-T4.25 where yes_dollars max=$0.26
   and no_dollars max=$0.57 (their sum < $1 = arb if they were asks).
   Synthetic asks: yes_ask = 1 - best_no_bid; no_ask = 1 - best_yes_bid.
5. Listing endpoint yes_ask field is always None; quotes only in the
   per-ticker orderbook endpoint (so N calls per scan).

The real bottleneck for thesis testing isn't data access, it's
event matching: Kalshi has hourly BTC ladders + rate-level binaries,
Polymarket has date-bounded thresholds + cut/hike decision binaries.
The "same event, different binarization" problem dominates.

Three paths discussed in the report — Path A (use WW's existing
cross_platform.py pipeline with jaccard + LLM), Path B (hand-curate
~10 pairs), or Path C (park the thesis). Awaiting user direction.

Files:
  scripts/probe_cross_platform.py     - direct probe with manual pair
                                        list (placeholder empty for now)
  reports/cross-platform-feasibility-2026-05-15.md - audit findings +
                                                    interpretation fix

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@WW-shan WW-shan merged commit 6a2ebff into WW-shan:main May 15, 2026
1 check passed
@Soli22de
Copy link
Copy Markdown
Collaborator Author

@WW-shan 谢谢 review — 你的 fix 我已经 rebase 进来了。

你 catch 的几个 issue 我同意

  1. v4 历史窗口 header bug:用 --end-date 2026-04-17 跑出来 header 写 -> 2026-05-15(我用 now 而不是 window_end_dt)。你加的 window_date_label()default_window_end() 把 UTC midnight 也对齐了,比原来更干净。
  2. probe_cross_platform.py 的 ask 计算错了:我把 Kalshi yes_dollars/no_dollars 当 asks 读,实际是 bids。要 1 - max(opposite_side_bids) 才是 synthetic ask。你直接改对了。
  3. build_negrisk_cohort.py--require-fees-enabledstore_true + default=True 实际等同于 hardcode True,没法关。改 BooleanOptionalAction 是对的。

你提的 builder fee 我打算严肃看

"Builder fees are not modeled. The maker-fee-zero assumption is for direct Polymarket platform fees; orders routed through a builder with builder_maker_fee_bps could pay a separate builder fee."

这个我之前没考虑过。如果我们事实上是通过 builder 路由的,builder_maker_fee_bps 不为 0,v4 那个 +$195/yr naive 翻 +$307/yr cherry-pick OOS 的数字就要重算。可能 thesis 又翻回负。

我下一步去查:

  • 现实 D/R market 上 builder_maker_fee_bps 是 0 还是非 0
  • 如果非 0,多大
  • 重跑 v4 看 verdict 怎么变

另外,我又测了 Path A(跨平台 Polymarket-Kalshi)

PR 里的最后两个 commit。结论:

  • 走完你的 6-step pipeline
  • 1000 candidate pairs 里 0 个过 option_match=True + edge>=0
  • 10 个真正同事件 NHL Stanley Cup pair 全部 negative edge(-2.77% 到 -97%)
  • "正 edge" 全是 question 结构错配(Polymarket 全年通胀 vs Kalshi 单月 CPI;Polymarket Austria vs Kalshi -AUS = Australia 的 IOC 代码)
  • 你的 deterministic _option_match 不用 LLM 就抓住了所有错配

跨平台 thesis 也死。

完整 writeup 在 reports/cross-platform-pipeline-verdict-2026-05-15.md

我接下来做

  1. 查 builder fee 真值(半小时)
  2. 如果 builder fee 非 0,重跑 v4 4 窗口
  3. 然后真的 case closed

@Soli22de
Copy link
Copy Markdown
Collaborator Author

Builder fee 调查 done。结论:你的 caveat 是 defensive,不 flip 现在的 verdict

具体 3 步:

  1. builder_maker_fee_bps 字段在 Gamma payload 里根本不存在。 抽样 8 个 D/R market(562793/4 等),每个 88 个字段,0 个含 "builder"。唯一非零 maker 相关字段是 makerBaseFee: 1000

  2. makerBaseFee 在官方 docs 里没出现。 docs.polymarket.com/trading/fees 明确写 "Makers are never charged fees. Only takers pay fees." 链上查询用 getClobMarketInfo(conditionID),返回 fee data = {r: feeRate, e: exponent, to: takerOnly}——就是 feeSchedule 那三个字段。makerBaseFee: 1000 应该是 legacy 字段,不在 active fee 模型里

  3. Production poly_strategy/maker.py:1625 设置 "fee_rate_assumption": 0.0 Production code + docs + on-chain 三者一致。

所以 v4 verdict 全部站得住:

  • 长尾 D/R: −$183/yr naive,0/64 持续赢家
  • 高vol multi-member: 96% structural skip
  • 高vol binary: −$22/yr naive,0/3 持续赢家
  • 跨平台 Polymarket-Kalshi: 0 候选过 option_match + edge≥0

全写在 reports/builder-fee-investigation-2026-05-15.md

Caveats 还活着但量级小:

  • Polygon gas per order (~$0.001-0.01)
  • USDC bridge 一次性成本
  • 未来如果 Polymarket 加 builder fee — 你那 caveat 留着挺好

Case 我这边算关了。 Maker basket arb 在我们能测的 4 个 cohort 上全死。要不要把 PR #10 也按 archive 模式 merge(跟 PR #9 一样),还是 close?

@Soli22de
Copy link
Copy Markdown
Collaborator Author

注意到你已经 merge 了这个 PR。Case closed。

3 天 + 4 个 cohort 测下来,maker basket arb thesis 在我们能测的所有方向上都死了。Fee bug 是真 bug(值得修),但单纯修 fee 救不回 thesis;多窗口、多 cohort、跨平台都把它打回原形。

非常感谢你的 review——你 catch 的几个 bug(window header、Kalshi synthetic ask、builder fee caveat、UTC midnight)都是真问题。

工具留在 repo 里(v4、多窗口聚合器、cohort builder、cross-platform pipeline run、targeted Kalshi puller),未来如果有新数据或新 thesis 可以复用。

Maker arb 这个方向到此为止,研究 close。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants