Arb-persistence study + James Bond depth verdict (94/yr ceiling) by Soli22de · Pull Request #9 · WW-shan/poly_strategy

Soli22de · 2026-05-12T18:27:26Z

TL;DR

14-day backfill + real CLOB depth check on the long-tail explicit_other arb candidate (James Bond). Verdict: technically real, commercially dead — max ~$3.78/event × ~4 events/14d ≈ $394/yr theoretical ceiling, less than gas after monitoring/withdrawal costs.

Binary D-vs-R sub-classifier built; backfill shows 98 "events" but most are forward-fill artifacts on illiquid markets. Live snapshot loop is now running on Soli22de's machine to accumulate honest bestAsk data — final binary verdict in 1-2 weeks.

Two read-first files

reports/research-summary-2026-05-13.md — the writeup, continues from yesterday's -05-12.md
reports/james-bond-book-validation-2026-05-12.md — the depth check that killed the thesis

What's in the diff

Infrastructure (~700 lines):

scripts/snapshot_gamma.py — every-15-min Gamma collector, writes per-snapshot markets.ndjson + classified groups.ndjson with edge_after_fee precomputed
run_snapshot_loop.ps1 — persistent loop runner (Windows PowerShell)
scripts/backfill_prices_history.py — reconstructs 14 days of synthetic snapshots via CLOB /prices-history mid prices (with caveats baked into the docstring)
scripts/analyze_arb_events.py — detects contiguous edge events, computes persistence_minutes, applies pre-locked GO/KILL thresholds
scripts/verify_james_bond_book.py — real CLOB /book depth check + fill simulation at sizes [10, 30, 50, 80, 100, 150]
scripts/analyze_binary_refined.py — sub-classifies 2-member groups into dvr/yes_no/pseudo (catches the Aston Villa vs Freiburg false-binary trap)

Reports (~1200 lines, 4 files):

reports/arb-persistence-2026-05-12.md
reports/james-bond-book-validation-2026-05-12.md
reports/binary-refined-2026-05-12.md
reports/research-summary-2026-05-13.md

Plus yesterday's leftover commit (separate, 0df9dee): experiment 7 v2 refined + 4-model GLM extraction results.

Key findings

Question	Answer
Does explicit_other long-tail arb exist?	Yes, only on James Bond. 30–60hr persistence per event.
What's the realistic profit ceiling?	~$394/yr at theoretical peak, $0–200/yr after gas. Dead at scale.
Does binary D-vs-R arb exist?	Unclear. Backfill says 98 events but forward-fill on illiquid markets produces fake persistence. Need live bestAsk over 7-14 days to verify.
Is `/prices-history` backfill useful in general?	For liquid markets yes, for long-tail no. Worth knowing.

Open questions for review

Pivot direction: research-summary §5.3 lists 4 candidate next theses (high-liquidity short-half-life / cross-platform / market-making / quit). Which do you want to talk through?
Live loop length: I set it to 15-min cadence indefinitely. Stop after 14 days, or keep going?
Kill list items (research-summary §5.4) — do you agree we should stop polishing T2/T3 LLM pipelines until we have a thesis whose bottleneck is description-reading rather than orderbook depth?

Test plan

This is mostly a research PR, not a code-shipping one. To verify the infra:

python -u scripts/snapshot_gamma.py --pages 6 produces data/snapshots/<date>/<HH-MM>/{markets,groups,meta}.ndjson
python -u scripts/analyze_arb_events.py runs cleanly on at least one snapshot
python -u scripts/verify_james_bond_book.py queries CLOB and produces the depth report (depends on James Bond markets still being open)
Read the two research-summary files and the depth-validation report

🤖 Generated with Claude Code

Experiment 1+3 (Gamma distribution + structural ground truth): - Pulled n=2000 active markets from gamma-api.polymarket.com (4 pages × 500, ~3.5s total) - vol24hr P10/P50/P90 = $0 / $40 / $18,333 - Liquidity P10/P50/P90 = $787 / $10,138 / $221,690 - Spread P10/P50/P90 = 0.001 / 0.01 / 0.10 (present in raw Gamma — PR WW-shan#4 spec was wrong to exclude this) - 14-90d-to-resolution band = 693 markets (35%) — target range OK - Derived 10,122 mutex pairs from 171 neg-risk groups → T4 $0 corpus validated Implications: - Q1 thresholds in PR WW-shan#3 are way too high; data-driven values in report - Q4 T4 corpus problem disappears (10k+ pairs from structure alone) - PR WW-shan#4 spec needs amendment for spread availability + dead-tier rephrasing (liquidity, not volume, as P10 boundary) Raw NDJSON under data/experiments/ is gitignored; only script + report committed. Experiment 2 (OpenRouter calibration script, ~$0.001 total): - One-shot validation that Gemini Flash V2 strict prompt actually produces schema-conforming JSON with verbatim grounding - Requires OPENROUTER_API_KEY at runtime; not run yet - Will calibrate the $0.00009/call estimate in PR WW-shan#6 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

n=5 random Polymarket markets, Gemini 2.0 Flash via OpenRouter, V2 strict prompt with verbatim grounding. Results: - schema_ok: 5/5 (100%) - grounding_ok: 5/5 (100%) — verbatim_text substring check passes on every clause across all 5 markets - Avg 2-5 clauses extracted per market, types diverse (deadline / tiebreaker / source) - Avg ambiguity_score 0.2-0.3 - Avg latency 3.2s/call Cost calibration: - Actual: $0.000214/call (5×$0.001070) - PR WW-shan#6 estimate: $0.000090/call - Off by 2.4× — output tokens are ~397 avg (estimate was 150), because verbatim_text transcription inflates output - 2000-market T2 projected: $0.43 (vs $0.18 in PR WW-shan#6) Conclusion: V2 strict prompt + Gemini Flash combination works out-of-the-box on real Polymarket descriptions. No prompt tweaks needed before T2 scale-up. PR WW-shan#6 budget needs an update but total cost still trivial. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three improvements over experiment 2: - 4 models head-to-head via OpenRouter (Gemini Flash, DeepSeek V3, GPT-4o-mini, Llama 3.3 70B) instead of just Gemini Flash. Tests the assumption that Flash — originally chosen in dash-ocr for image OCR — is also best for pure text extraction. - n=30 stratified by description length (10 short + 10 medium + 10 long) instead of n=5 random - Full per-call NDJSON dump (raw response, parsed JSON, tokens, cost, schema/grounding checks) so we can read actual clause text afterward — experiment 2 lost this data Hard cost cap: $0.50 (rough est is $0.04 for 120 calls). Auto-metrics report shows per-model schema/grounding/clause-count. The actionable/structural/trivial qualitative judgment will be done in a separate pass that reads the NDJSON. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

User-paid elysiver.h-e.top endpoint hosts GLM models including glm-5 and glm-4.6 (and glm-5.1 but quota-exhausted today). Cost = $0 to us (user-paid, quota-limited), latency 15-25s/call. Changes: - PROVIDERS dict abstracts OpenAI-compatible endpoints (openrouter, elysiver); each model entry declares its provider. - load_env_file() reads .env for OPENAI_BACKUP_API_KEY (gitignored). - call_model() now routes per-model and looks up keys per-provider. - Browser-like User-Agent header — Cloudflare on elysiver returns 1010 to urllib's default UA; verified Mozilla/5.0 UA passes the check. - New --out-name arg so partial reruns (e.g. only the new GLM models) don't overwrite earlier 4-model results. - Added glm-5 and glm-4.6 model entries (cost=0 placeholder). Smoke-tested: both GLM models complete full V2 prompt against real Polymarket descriptions, schema_ok and grounding_ok, extract substantive clauses (exclusion / numeric_threshold types). windhub.cc primary endpoint remains unusable: harder Cloudflare JS challenge that browser UA alone doesn't pass. Also commits the previously-run 4-model report file (experiment-multi-model-extraction-2026-05-12.md) which captured the OpenRouter Gemini Flash / DeepSeek V3 / GPT-4o-mini / Llama 3.3 70B baseline used for judging. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Experiment 7 (refined v2): classifies neg-risk groups into explicit_other / binary / open_set tiers using catch-all keyword markers. Replaces broken size>=8 heuristic that flagged Nobel-style open candidate sets as exhaustive. 06:13 UTC snapshot found 1 strict candidate (James Bond) + 7 binary candidates. GLM-4.6 / GLM-5 results: 29/29 vs 16/30 success on elysiver-routed free endpoints. GLM-4.6 emerges as $0 alternative to DeepSeek V3 for T2 resolution-clause extraction. Research summary (Day 1) captures the day's verdicts for classmate WW review: thesis live, James Bond needs CLOB depth verification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…cale Infrastructure: - snapshot_gamma.py: 15-min Gamma snapshot collector, writes per-snapshot markets.ndjson + classified groups.ndjson with edge_after_fee precomputed - run_snapshot_loop.ps1: persistent loop runner (PowerShell) - backfill_prices_history.py: reconstructs 14 days of synthetic snapshots via CLOB /prices-history (mid-price approximation, with caveats) - analyze_arb_events.py: detects contiguous edge events, computes persistence_minutes, applies pre-locked GO/KILL thresholds - verify_james_bond_book.py: real CLOB /book depth check + fill simulation - analyze_binary_refined.py: sub-classifies 2-member groups into dvr/yes_no/pseudo, runs per-subtier event detection Findings (14-day window, see research-summary-2026-05-13.md for the full writeup): - explicit_other tier: 4 events, all on James Bond group, 30-60hr persistence, mid-edge +9% to +18%. CLOB depth check shows max per-event profit ~$3.78 (at 80-unit basket), breakeven at ~120 units. Annualized ceiling ~$394 before gas/withdrawal. Commercially dead. - binary dvr tier (D vs R races): 98 events at +2% threshold, but most are forward-fill artifacts on illiquid markets (9-34 distinct prices across 1140 snapshots = ~one trade per 30hr). Live bestAsk required to verify. Conclusion: live snapshot loop continues running to accumulate honest bestAsk time series; explicit_other thesis killed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Live-only retest of the binary tier (--live-only flag on both analyzers) confirmed the 22-hour persistence seen in backfill was forward-fill artifact. Real bestAsk-based events flash for 1 snapshot (15 min) and disappear. A few D/R races (WV/TN/SC Senate/Gov) showed persistent low-edge floors of 2-5% across 14 hours of live data. Generalized verify_book script (scripts/verify_group_book.py, takes --group-id-prefix arg) and ran depth check on the most stable candidate (SC Governor D/R, +2.55% sustained edge, $4,377 min_liq): Marginal edge: +2.55% 50u basket: +$0.52 profit 200u basket: +$0.31 (breakeven approaching) 500u basket: -$21 (negative) Republican side bestAsk=0.91 has depth of only 3.9 units ($3.5 of fillable). Edge collapses on first non-trivial fill. Both thesis branches now have definitive verdicts: - explicit_other (James Bond): $3.78/event max - binary D-vs-R (SC Gov sample): $0.52/event max Both killed by the same structural fact: long-tail Polymarket books hold $5-80 of depth at best ask. The "persistent edge" is real but is the unfilled cost of nobody bothering to take $5 of order flow. Action: live snapshot loop stopped (no point accumulating data when both branches are dead). Existing 1.3GB local data kept as baseline. Pivot direction TBD with WW via PR WW-shan#9. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Soli22de · 2026-05-13T03:34:56Z

Day 3 follow-up — both thesis branches now confirmed dead at scale

Added commit 3cf624c to the PR with Day 3 findings (script + 2 reports + amended research-summary-2026-05-13.md §3.6–3.8 and §5).

TL;DR: ran analyze_binary_refined.py --live-only against ~14 hours of pure live bestAsk data. Backfill's "22hr persistence" was indeed forward-fill artifact (median live persistence: 15 min = 1 snapshot). But a few D/R races showed persistent small-edge floors (WV/TN/SC Senate/Gov, +2-5% for many hours), so I generalized the depth-check script and ran it on the most stable candidate.

SC Governor D/R depth check (was the best-looking live candidate, +2.55% sustained for 14hrs, $4,377 min_liq):

Basket size	Edge $	Edge %
1u (marginal)	+$0.026	+2.55%
50u	+$0.52	+1.04%
200u	+$0.31	+0.15% (breakeven)
500u	-$21	-4.29%
1000u	-$70	-7.01%

Killer: Republican-side bestAsk = 0.91 with only 3.9 units of depth ($3.5 fillable). A single $4 trade closes the edge.

Both thesis branches:

Branch	Best per-event profit	Verdict
explicit_other (James Bond)	$3.78	Dead at scale
binary D-vs-R (SC Gov etc)	$0.52	Dead at scale

Same structural cause: Polymarket long-tail books carry $5–80 of depth at best ask. The "persistent edges" are the price of nobody bothering with $5 of order flow.

Actions taken:

✅ Stopped live snapshot loop (no point accumulating when both branches dead). 1.3GB local data kept as baseline.
✅ Generalized depth checker → scripts/verify_group_book.py --group-id <prefix>
✅ Pushed Day 3 commit to this PR

Three questions for you (full version in research-summary §8):

Agree on stopping the loop? (already stopped, can restart cheaply)
Of the 4 candidate next-theses in §5.2, which to scope first? My weak preference is B (market-making) — same data, different lens, asks "if persistent edge is the cost-of-being-the-only-bidder, can we BE the bidder?"
Worth writing a formal docs/thesis-postmortem.md so the next person who looks at long-tail Polymarket arb doesn't repeat the path?

After user pushed back on the "thesis dead" verdict ("我不信邪"), built two maker-strategy simulators: v1 (mid-touch, simulate_maker_basket.py): - 3.15M tick points across 157 tokens over 14 days - For each (group, day, markup): did mid touch (bestAsk - markup)? - Result: $15,546/yr theoretical across 72 groups @ $100 basket - Known weakness: mid touching != trade at that price v2 (trade tape, simulate_maker_basket_v2.py): - Real Polymarket /trades tape, 48,030 raw trades over 14 days - Filtered to SELL Yes trades (the type that would hit a maker bid) - Only 1,602 / 48,030 (3.3%) qualified - Result: $918/yr theoretical = 17x reduction from v1 After realistic adjustments (queue priority, D/R correlation, partial-fill hedging cost, gas): $200-500/yr @ $100 basket, scaling to ~$2-5k/yr @ $1000 basket with $144k capital tied up. Updated research-summary §3.9: I overclaimed "thesis dead" from 2 taker-only depth snapshots. Real verdict: TAKER dead, MAKER alive at hobby scale. User's skepticism was correct; my single-perspective testing was insufficient methodology. Verdict matrix: - Taker basket arb: $0-200/yr killed - Maker basket arb (v1 mid-sim): $15k optimistic phantom - Maker basket arb (v2 trade tape): $200-500/yr honest Live loop remains stopped. Existing data + scripts kept for paper trading next phase if pursued. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Soli22de · 2026-05-13T03:56:29Z

Correction: I was wrong yesterday — MAKER thesis lives at $200-500/yr

Soli22de pushed back on the "thesis dead" verdict, which led to actually building the maker-strategy backtest I should have built before declaring the whole thing dead. Two simulators added in commit 4d1e2a6:

v1 (mid-touch, `scripts/simulate_maker_basket.py`)

3.15M mid-price tick points across 157 tokens over 14 days
Fill proxy: did mid touch (bestAsk - markup) on day d?
Result: $15,546/yr theoretical across 72 groups @ $100 basket
Caveat written in: mid touching != trade at that price

v2 (trade tape, `scripts/simulate_maker_basket_v2.py`)

Real data-api.polymarket.com/trades tape, 48,030 raw trades
Filtered to SELL-Yes trades in window: 1,602 (3.3%)
Per (group, day, markup): did any SELL-Yes trade hit at price ≤ target?
Result: $918/yr theoretical — 17x reduction from v1

Metric	v1 (mid-touch)	v2 (trade tape)
Total daily $	$42.59	$2.51
Annualized	$15,546	$918
Groups w/ positive expected	49/72	17/72
Best per-group	Kansas Gov $4.87/day	Iowa Senate $1.00/day

Realistic adjustment

After queue priority (×0.6), D/R correlation (×0.7), partial-fill hedging (-10%), Polygon gas (-20%): $200-500/yr at $100 basket, ~$2-5k/yr at $1000 basket.

Corrected verdict matrix

Strategy	Real $/yr	Status
Taker basket arb	$0-200	Dead at scale (verified)
Maker mid-sim (v1)	$15k phantom	Methodology was wrong
Maker trade-tape (v2)	$200-500 @ $100	Honest, hobby scale

What I learned

I overclaimed "thesis dead" from 2 taker-only depth snapshots. Robust testing needs at least: multiple strategy perspectives (taker/maker/hold), multiple snapshots in time, realistic fill models (trade tape > mid-touch), and multiple capital sizes.

Soli22de's "我不信邪" pushback is the only reason this PR has a defensible final verdict instead of a wrong one.

Three questions, revised

Want to take the next step — paper-trade 2-3 top v2 candidates (Iowa Senate, Georgia Senate, Illinois Senate D/R) for 1-2 weeks to validate the ~$1/day-per-group claim with real fills?
Accept hobby scale ($2-5k/yr at $1k basket size) or pivot to cross-platform / HFT-lane theses?
Should I extend the trade-tape simulator to cover non-D/R structures (initiative referenda, sports specials) — could uncover other tradeable groups we missed?

…cherry-pick WW review of PR WW-shan#9 caught 4 bugs in the v1/v2 maker simulations and verify_group_book.py: 1. Income not capped by realized trade size — formula `fill_rate * avg_edge * intended_basket` assumed every fill captured the full $100. With avg trade sizes of 3-9 units, that overstates by 5-20x. 2. Maker target could cross bestAsk — `max(t, bestBid+0.001)` for narrow spreads could produce target = bestAsk (crossing/taker order, not maker). 3. verify_group_book.py partial-fill cost wrong — `cost = avg_px * size` should be `avg_px * filled`. Inflated negative edge numbers when book ran out. 4. avg_min_leg_sell_size was logged as a caveat but never folded into the main income formula. Fixes: - scripts/simulate_maker_basket_v2.py: per-day actual fill = min(intended, min-over-legs of qualifying-trade-size at price <= target); income = sum(edge_per_unit * actual_units) / window_days; target strictly clamped below bestAsk; skip markup levels with no valid maker zone. - scripts/verify_group_book.py: compute basket cost/fee/edge at actual fillable units (not intended size); flag CAPPED rows where book runs out. Reran both: - v3 (size-capped): naive across 72 groups = -$263/yr; cherry-pick 18 positive-edge groups = +$117/yr @ $100 basket - SC Gov taker @ 200u = +$2.02 (max), capped at 1304u for larger sizes Research summary §3.10 + §3.11 supersede §3.9. TL;DR updated. Honest final verdict: long-tail D-vs-R spread is market friction, not alpha. Retail investor cannot net positive after fee + queue + gas. The thesis isn't dead — it was never alpha to begin with, only looked like alpha because of methodological errors at multiple layers. Thanks to WW for the rigorous review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Soli22de · 2026-05-13T06:35:44Z

@WW-shan 你提的 4 条全对。已修 + push commit d222906，先告诉你结果免得你浪费时间重做 v2：

修了什么

你提的 bug	修哪	怎么修
收益没按真实成交量封顶	`simulate_maker_basket_v2.py:241-280`	每日 fill = `min(intended_basket, min-over-legs of 在 price≤target 的 SELL Yes size 合计)`；总日 $ = `sum(edge_per_unit * actual_units) / window_days`
Maker target 可能越过 bestAsk	`simulate_maker_basket_v2.py:236-260`	加 `t = min(t, bestAsk - 0.001)` 强制 maker zone；spread < 0.002 的 markup 直接 skip
Taker partial-fill `cost = avg_px * size`	`verify_group_book.py:167-194`	重写：先找 `max_fillable`，再用 `actual_units = min(intended, max_fillable)` 重算 cost/fee/edge；CAPPED 行加 ⚠️
`avg_min_leg_sell_size` 没进主公式	同 #1	已合并

修完跑出来的新数字（v3）

Version	方法	年化
v1 mid-touch	错（mid 不等于 fill）	+$15,546
v2 size-uncapped	错（你抓的那条）	+$918
v3 size-capped + no-crossing	修完	−$263 naive / +$117 cherry-pick

72 组 naive 部署 = 平均亏 $263/yr。只 trade 那 18 个正期望组 = +$117/yr @ $100 basket。

更细看 markup 表：

Markup	Avg fill rate	Avg edge/unit	总日 $
$0.005	5.9%	-1.27%	-$2.09
$0.010	5.8%	-0.38%	-$1.04
$0.020	5.7%	+0.24%	-$0.88
$0.030	5.6%	+0.66%	-$0.78
$0.050	5.4%	+0.90%	-$0.75

每个 markup 总日 $ 都负 —— 即使最好的 markup ($0.05)，total = -$0.75/day。原因：avg edge/unit 在小 markup 时被 fee 吃掉，大 markup 时 fill rate 太低。

你判断中那两条对得很正

你说 "taker 一次性吃 bestAsk 大概率仍然死" —— 确认，SC Gov fixed-version max profit 是 $2.02 一次性 @ 200u，再大就 capped。
你说 "maker $200-500 不能信，需要按真实成交量封顶后重算" —— 直接命中。修完后 naive 是负的，cherry-pick 也只剩 $117。

关键诚实结论修正

之前 PR 标题/描述说"maker thesis lives at $200-500/yr"，那是错的。修完后真实判决：

长尾 D-vs-R 的 spread 不是 alpha，是市场摩擦。零售投资者做不到 net positive after fee + queue + gas。

完整演变写在 reports/research-summary-2026-05-13.md §3.10 + §3.11（覆盖 §3.9 那个中间版本）。

还想问你

这次 v3 在你模型里跑过没？如果你的中转测试有不同结果欢迎贴。
我对 cherry-pick 那 18 个正期望组的态度：不应该当成可执行 alpha，因为是 hindsight bias —— 我们没有 oracle 提前知道哪 18 个。你怎么看？
这个 PR 的结论现在是 "thesis 是 hobby 都不到"，你建议怎么处理：(a) merge with this verdict (b) close as "exhausted" (c) keep open for future ref

Post-mortem of the maker-arb thesis after WW archived PR #9. Two methodology fixes in v4: (A) maker fee was wrongly = taker fee in v3. Polymarket docs and live feeSchedule.takerOnly=True on 100/100 sampled markets confirm makers never pay fees. Corrected: maker_fee = 0. (B) v3 was 100% in-sample. Added 10/4 train/test split + multi-window orchestration to detect window-luck. Findings progression: v3 (in-sample, taker fee): -$263/yr naive, +$117 cherry v4 single window (today): +$195/yr naive, +$289 cherry OOS v4 multi-window (4 x 14d = 56d): naive mean -$183 (sign flips!), cherry mean +$251 but UNSTABLE The decisive result: across 4 non-overlapping 14-day windows covering 2026-03-20 to 2026-05-15: - 0 of 64 groups have positive OOS in >=3/4 windows - 44/64 groups (69%) had zero positive OOS across all 4 windows - Even the 2 groups consistently in top-18 by in-sample (Wisconsin, Kansas) had positive OOS in only 2/4 and 1/4 windows respectively - Naive deploy sign flips: -$1,117 in 3/20-4/03 window, +$239 in 4/03-4/17 window Cherry-pick "wins" within each window because we pick this window's winners; but the winners rotate, so no actionable alpha. Files: scripts/simulate_maker_basket_v4.py - corrected fee + IS/OOS split + --end-date for time-shifting scripts/aggregate_v4_multi_window.py - cross-window stability reports/maker-simulation-v4-*-w-*.md - 4 per-window reports reports/maker-simulation-v4-multi-window-2026-05-15.md - the verdict Note: poly_strategy/maker.py production code already has fee_rate_assumption=0.0 for maker legs. The fee bug was localized to my standalone research script, not production. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

张靖恒 and others added 8 commits May 12, 2026 14:18

Fix experiment env loading and GLM model list

eadaef5

WW-shan and others added 2 commits May 13, 2026 14:28

Merge remote-tracking branch 'origin/main' into pr-9

13e9be3

WW-shan added 2 commits May 13, 2026 14:38

Fix research fill sizing assumptions

4766232

Merge PR research fixes with sizing review

6f25834

WW-shan merged commit 416a24a into WW-shan:main May 13, 2026
1 check passed

Soli22de mentioned this pull request May 15, 2026

Multi-window post-mortem: maker fee corrected, thesis still dies on 56-day stability test #10

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arb-persistence study + James Bond depth verdict (94/yr ceiling)#9

Arb-persistence study + James Bond depth verdict (94/yr ceiling)#9
WW-shan merged 13 commits into
WW-shan:mainfrom
Soli22de:experiment/2026-05-12-gamma-baseline

Soli22de commented May 12, 2026

Uh oh!

Soli22de commented May 13, 2026

Uh oh!

Soli22de commented May 13, 2026

Uh oh!

Soli22de commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Soli22de commented May 12, 2026

TL;DR

Two read-first files

What's in the diff

Key findings

Open questions for review

Test plan

Uh oh!

Soli22de commented May 13, 2026

Day 3 follow-up — both thesis branches now confirmed dead at scale

Uh oh!

Soli22de commented May 13, 2026

Correction: I was wrong yesterday — MAKER thesis lives at $200-500/yr

v1 (mid-touch, scripts/simulate_maker_basket.py)

v2 (trade tape, scripts/simulate_maker_basket_v2.py)

Realistic adjustment

Corrected verdict matrix

What I learned

Three questions, revised

Uh oh!

Soli22de commented May 13, 2026

修了什么

修完跑出来的新数字（v3）

你判断中那两条对得很正

关键诚实结论修正

还想问你

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

v1 (mid-touch, `scripts/simulate_maker_basket.py`)

v2 (trade tape, `scripts/simulate_maker_basket_v2.py`)