Skip to content

Longtail + resolution-reading research plan (v0.1) + open questions#1

Merged
WW-shan merged 4 commits into
WW-shan:mainfrom
Soli22de:discuss/longtail-thesis
May 11, 2026
Merged

Longtail + resolution-reading research plan (v0.1) + open questions#1
WW-shan merged 4 commits into
WW-shan:mainfrom
Soli22de:discuss/longtail-thesis

Conversation

@Soli22de
Copy link
Copy Markdown
Collaborator

@Soli22de Soli22de commented May 11, 2026

Summary

Drafts the next research phase: finding mispricings in long-tail Polymarket markets via LLM-driven resolution-criteria reading and relation discovery. The plan is research-mode only — no live execution path is touched.

Two docs added under docs/plans/:

  1. 2026-05-11-longtail-resolution-thesis.md (622 lines) — full spec.

    • Why this direction now: main-market YES+NO arb half-life is <1 min post-Oct 2024 (Anatomy of Polymarket paper); 63% of short-term markets had zero 24h volume; new category-tiered Polymarket fees in 2026 make our current fees.py constant wrong.
    • Four work streams: T1 long-tail tier filter, T2 resolution-criteria reader, T3 internal same-event detector, T4 LLM-as-judge rule-eval harness.
    • Cross-cutting: fee model upgrade (Polymarket 2026 category fees).
    • Decision gates G1-G6, kill criteria, risk table.
    • All research-mode; no live execution path.
  2. 2026-05-11-longtail-thesis-open-questions.md (8 open decisions) — Gate G1 needs all 8 filled before any work stream is dispatched. Use this PR's comment thread to discuss; commit decisions back to the branch as we converge.

Also adds .deepseek/ to .gitignore (local DS scratch dir, not for sharing).

Key facts driving the plan (all cited in §10 of the main doc)

  • Polymarket 2026 fees are category-tiered (Crypto 1.80% / Politics 1.00% / Sports 0.75% / Geopolitics 0%) — fees.py needs an update.
  • YES+NO arb half-life <1 min by Oct 2024 (arxiv 2603.03136).
  • ~$39.6M of NegRisk rebalancing arb already captured by pros (2024-04 → 2025-04).
  • Lopez-Lira: LLM headline strategies decay Sharpe 6.54→1.22 in 3 years — kill criteria are explicit.
  • Paradigm 2025-12: Dune dashboards double-count Polymarket volume ~2x — don't source thresholds from Dune.
  • LLM-as-judge: balanced accuracy / Youden's J, not F1; ensemble 3 models (arxiv 2512.08121, 2512.16041).

What this PR is NOT

  • Not an implementation PR. No code changes outside .gitignore and docs/plans/.
  • Not a request to merge before discussion. Treat this as the start of Gate G1.

How to review

  1. Read the main thesis doc top-to-bottom (≈30 min). The §1 "why this direction" and §7 "kill criteria" are the most important sections.
  2. Go through the 8 questions in the open-questions doc. Drop comments inline on each **Decision:** line.
  3. Once all 8 have consensus, commit the decisions to this branch and we merge.

Test plan

  • Q1 — T1 tier thresholds confirmed
  • Q2 — T2 model choice confirmed
  • Q3 — T3 embedding model confirmed
  • Q4 — T4 labeling protocol confirmed
  • Q5 — code-review flow confirmed
  • Q6 — DS task-pack granularity confirmed
  • Q7 — sync cadence confirmed
  • Q8 — failure logging method confirmed
  • 4 reviewers acknowledged → Gate G1 passes

张靖恒 and others added 3 commits May 11, 2026 22:23
Drafts the v0.1 spec for the next research phase: long-tail watchlist
tiers, resolution criteria reader, internal multi-market detector, and
LLM-as-judge evaluation harness. Gate G1 (threshold confirmation) must
pass before any work stream is dispatched. Also ignores local .deepseek/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extracts the 8 open decisions from §9 of the v0.1 plan into a focused
discussion artifact. Gate G1 passes once all 8 Decision fields are
filled and acknowledged by 4 reviewers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings in WW-shan's 3 commits from 2026-05-11 (maker no-fill diagnostics,
opportunity optimization ranking, opportunity chain diagnostics) so the
cross-fork PR shows only the new plan + open-questions docs.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a research-only planning spec for the next phase of long-tail Polymarket mispricing research (LLM-driven resolution-criteria reading + intra-event relation discovery), plus an open-questions checklist to drive Gate G1 alignment. Also updates .gitignore to exclude a local DeepSeek scratch directory.

Changes:

  • Add a detailed long-tail resolution-reading research thesis/spec (v0.1).
  • Add a Gate G1 “open questions” document with eight decisions to resolve via PR discussion.
  • Ignore .deepseek/ local scratch directory.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 3 comments.

File Description
docs/plans/2026-05-11-longtail-resolution-thesis.md New long-tail research thesis/spec and proposed workstreams (T1–T4) + gates/kill-criteria.
docs/plans/2026-05-11-longtail-thesis-open-questions.md New Gate G1 decision checklist to converge on thresholds, models, labeling, cadence, and logging.
.gitignore Add .deepseek/ to ignored paths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +5
# 2026-05-11 长尾 + 规则细读 研究方案

本文件是下一阶段研究工作的规格说明(spec)。在与团队对齐 *待决问题* 一节之前,**不要把任意一节拆给 DS / 其他 agent 执行**。所有 I/O 契约、阈值、验收标准必须先定下来。

---
Comment on lines +1 to +46
# 待决问题:长尾 + 规则细读方案 (v0.1)

**关联文档**:[`2026-05-11-longtail-resolution-thesis.md`](./2026-05-11-longtail-resolution-thesis.md)

本文件是该方案 §9 的独立讨论稿,供团队评论和决议。每条问题下方留 `**Decision:**` 一行;讨论收敛后填入,并同步更新主方案至 v1.0。

讨论方式:在本 PR 上对每条问题做评论。**Gate G1 通过条件**:8 条 Decision 全部填入,4 人书面确认。

---

## Q1. T1 长尾 Tier 阈值

主方案初稿:

| Tier | 24h 量 | 7d 量 | spread | 距离 resolution |
|---|---|---|---|---|
| headline | ≥ $50k | ≥ $200k | ≤ 1¢ | 任意 |
| mid | $5k-$50k | $20k-$200k | 1-3¢ | 任意 |
| longtail | $100-$5k | $1k-$20k | 3-10¢ | 14-90 天 |
| dead | < $100 | < $1k | > 10¢ | 任意 |

**问题**:
- 数字是否合理?还是应该先拉一周 Gamma 实际分布数据,看百分位数后再定?
- "距离 resolution 14-90 天" 是基于学术文献的 30-14 天最低效区间,但我们放宽到 14-90 天保留更多样本。这个范围对吗?
- 是否需要单独区分 neg-risk 子市场 tier(neg-risk 整组可能流动性好,但单个子市场长尾)?

**Decision:**

---

## Q2. T2 模型选择

主方案初稿:Haiku 4.5 主跑(提取),Sonnet 4.6 在 ambiguity 高时复核。

**问题**:
- 同意这个分层吗?
- 是否试 DeepSeek?理由:成本可能更低,且当前项目本来就是 DS 帮我们干活的语境。代价:英文金融文本理解 vs Claude 系列的对比未知。
- 是否需要在 prompt 调优阶段双跑(Haiku + DeepSeek)做 head-to-head 对比,再决定主跑?

**Decision:**

---

## Q3. T3 Embedding 模型

主方案初稿:OpenAI `text-embedding-3-small`($0.00002/1k token,预算 ~$0.20 单次完整跑)。

8 条 Decision 全部填入后:

1. 我(Claude)把决议同步到主方案 `2026-05-11-longtail-resolution-thesis.md`,版本号从 v0.1 升到 v1.0。
@WW-shan WW-shan merged commit c7e0ec3 into WW-shan:main May 11, 2026
1 check passed
@Soli22de Soli22de deleted the discuss/longtail-thesis branch May 12, 2026 03:07
WW-shan pushed a commit that referenced this pull request May 12, 2026
Adds DS task pack #1 for feeSchedule metadata preservation and maker rebate diagnostics.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants