Skip to content

Quick decisions for §9 (Gate G1 prep, 2-person team)#3

Merged
WW-shan merged 3 commits into
WW-shan:mainfrom
Soli22de:prep/q9-quick-decisions
May 12, 2026
Merged

Quick decisions for §9 (Gate G1 prep, 2-person team)#3
WW-shan merged 3 commits into
WW-shan:mainfrom
Soli22de:prep/q9-quick-decisions

Conversation

@Soli22de
Copy link
Copy Markdown
Collaborator

What this PR is

A condensed decision sheet for the 8 open questions in docs/plans/2026-05-11-longtail-thesis-open-questions.md. Instead of writing long answers, each Q now has 建议 / 替代 / Decision triplets so we can resolve via short replies.

Key adjustment

Original Gate G1 said "4 reviewers acknowledge". Team is 2 people. G1 becomes "both acknowledge". Q4 (labeling) is the most affected — sample size dropped from 100 rules to 50 because of how double-labeling math works with 2 reviewers (2 × 50 = 100 labels = 50 rules × 2).

Q1 stays open

Q1 (tier thresholds) is the only Q that genuinely needs data, not opinion. Recommendation: wait on it until DS task pack #2 (Gamma distribution reality-check) runs and gives us actual percentiles. The other 7 can be resolved by reading this PR.

How to review

Open docs/plans/2026-05-12-q9-quick-decisions.md. For each Q:

  • If the 建议 sounds right → reply "yes" inline or just commit Decision: 建议 to fill the placeholder
  • If you prefer the 替代 → reply "alt" or commit Decision: 替代
  • If you want something else → reply with your version

I'll sync decisions back to the main plan (v1.0 bump) once all 7 are filled.

Test plan

  • Q2 — T2 model choice resolved
  • Q3 — embedding model resolved
  • Q4 — labeling protocol resolved (key one given team size)
  • Q5 — review flow resolved
  • Q6 — DS task pack granularity resolved
  • Q7 — sync cadence resolved
  • Q8 — failure logging resolved
  • (later) Q1 resolved once Gamma distribution data is in

Compresses 7 open questions into "recommended default + one alternative"
form so the 2-person team can resolve via yes/swap. Q1 is left waiting
on Gamma distribution data (DS pkg WW-shan#2). Also flags that the plan's
"4-reviewer Gate G1" should be revised to "2-reviewer" given actual
team size.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Author (Soli22de) is the PR-author and one of the 2 team members, so
filling preference/flow questions unilaterally and leaving the rest
for WW to react to.

- Q1: deferred to Gamma distribution data (PR WW-shan#4 output)
- Q2: REVISED from original "Haiku + Sonnet" to OpenRouter Gemini
  Flash V2 main + same-model V1 prompt fallback, per PR WW-shan#6 memo
- Q3: accepted (OpenAI text-embedding-3-small)
- Q4: deferred to WW (his labeling time investment)
- Q5: accepted (peer review + Claude sanity + 24h auto-merge)
- Q6: accepted (one workstream = one DS pack, split if >300 LOC)
- Q7: accepted (Mon evening 30min + biweekly kill criteria review)
- Q8: accepted (append to main plan, no separate decisions-log)

WW should react via fix commit if any of the 6 self-answered Q's
disagree, else they take effect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Soli22de pushed a commit to Soli22de/poly_strategy that referenced this pull request May 12, 2026
Per dash-ocr-pipeline's current production code (`structured_stage12.py`,
`retry_silent_empty_checkpoint.py`), the silent-empty fallback uses the
*same* Gemini 2.0 Flash model with a permissive V1 prompt, not a switch
to Qwen 2.5-72B. The two-stage Gemini→Qwen pairer path was retired:
single-call Gemini Flash F1 = 0.965 beat the Qwen pairer F1 = 0.950
in production validation 2026-05-06.

Updated:
- §3.2 budget: T2 V1 fallback uses Gemini Flash, not Qwen (saved ~$0.01)
- §3.2 budget: T3 LLM verification uses Gemini Flash, not Qwen
- §4 defaults: T2 fallback row corrected
- §7 Q2 implication: simplified to single-model V2/V1 prompt pair

Also notes that PR WW-shan#3 has already been updated with the corrected Q2
Decision (commit b299269) so the two docs now agree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Soli22de
Copy link
Copy Markdown
Collaborator Author

我(Soli22de)这边把能自答的 6 个 Q 都填上了(commit b299269):

如果对自答的 6 个有意见,直接 push 一个 fix commit 到这个分支改 Decision: ... 行(你昨天 fix b6aba62 的同样模式);没意见就跳过,48 小时后我视作默认通过。

Replaces the "wait for data" placeholder with actual P10/P50/P90
values from the experiment 1 run (PR WW-shan#7). Original draft thresholds
were 100× too high — real Polymarket distribution has P50 vol24hr
of only ~$40/day.

Key implementation note baked in: because P10 vol24hr = $0, the
dead/longtail boundary should use liquidity (P10 = $787) rather
than volume.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Soli22de
Copy link
Copy Markdown
Collaborator Author

Q1 已用实测数据填好 (commit on this branch),详见 PR #7 实验报告。所有 8 个 Q 的 Decision 进展:

  • Q1 ✅ 实测数据
  • Q2 ✅ Gemini Flash V2+V1 (经修正)
  • Q3 ✅ text-embedding-3-small
  • Q4 ⏳ 等你确认 — 不过 PR Experiment: Gamma distribution baseline + OpenRouter calibration script #7 显示已有 10k+ 结构化 mutex pairs,T4 corpus 不再需要 rule_discovery 跑一遍,你只需要决定愿意每周花多少时间做人工标注
  • Q5 ✅ peer review + 24h 自合
  • Q6 ✅ 一个工作流一个 DS 包
  • Q7 ✅ 周一晚 30 分钟
  • Q8 ✅ append 到主方案

如果对 7 个自答的 Q 有意见,push fix commit 给我;否则视作通过。Gate G1 还差 Q4 你点头。

WW-shan pushed a commit that referenced this pull request May 12, 2026
Adds DS task pack #3 for the T2 resolution criteria reader, including OpenRouter chat-mode wiring, verbatim grounding, cost guardrails, and mocked test requirements.
@WW-shan WW-shan merged commit 028eb97 into WW-shan:main May 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants