Quick decisions for §9 (Gate G1 prep, 2-person team)#3
Merged
Conversation
Compresses 7 open questions into "recommended default + one alternative" form so the 2-person team can resolve via yes/swap. Q1 is left waiting on Gamma distribution data (DS pkg WW-shan#2). Also flags that the plan's "4-reviewer Gate G1" should be revised to "2-reviewer" given actual team size. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Author (Soli22de) is the PR-author and one of the 2 team members, so filling preference/flow questions unilaterally and leaving the rest for WW to react to. - Q1: deferred to Gamma distribution data (PR WW-shan#4 output) - Q2: REVISED from original "Haiku + Sonnet" to OpenRouter Gemini Flash V2 main + same-model V1 prompt fallback, per PR WW-shan#6 memo - Q3: accepted (OpenAI text-embedding-3-small) - Q4: deferred to WW (his labeling time investment) - Q5: accepted (peer review + Claude sanity + 24h auto-merge) - Q6: accepted (one workstream = one DS pack, split if >300 LOC) - Q7: accepted (Mon evening 30min + biweekly kill criteria review) - Q8: accepted (append to main plan, no separate decisions-log) WW should react via fix commit if any of the 6 self-answered Q's disagree, else they take effect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Soli22de
pushed a commit
to Soli22de/poly_strategy
that referenced
this pull request
May 12, 2026
Per dash-ocr-pipeline's current production code (`structured_stage12.py`, `retry_silent_empty_checkpoint.py`), the silent-empty fallback uses the *same* Gemini 2.0 Flash model with a permissive V1 prompt, not a switch to Qwen 2.5-72B. The two-stage Gemini→Qwen pairer path was retired: single-call Gemini Flash F1 = 0.965 beat the Qwen pairer F1 = 0.950 in production validation 2026-05-06. Updated: - §3.2 budget: T2 V1 fallback uses Gemini Flash, not Qwen (saved ~$0.01) - §3.2 budget: T3 LLM verification uses Gemini Flash, not Qwen - §4 defaults: T2 fallback row corrected - §7 Q2 implication: simplified to single-model V2/V1 prompt pair Also notes that PR WW-shan#3 has already been updated with the corrected Q2 Decision (commit b299269) so the two docs now agree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collaborator
Author
|
我(Soli22de)这边把能自答的 6 个 Q 都填上了(commit b299269):
如果对自答的 6 个有意见,直接 push 一个 fix commit 到这个分支改 |
4 tasks
Replaces the "wait for data" placeholder with actual P10/P50/P90 values from the experiment 1 run (PR WW-shan#7). Original draft thresholds were 100× too high — real Polymarket distribution has P50 vol24hr of only ~$40/day. Key implementation note baked in: because P10 vol24hr = $0, the dead/longtail boundary should use liquidity (P10 = $787) rather than volume. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collaborator
Author
|
Q1 已用实测数据填好 (commit on this branch),详见 PR #7 实验报告。所有 8 个 Q 的 Decision 进展:
如果对 7 个自答的 Q 有意见,push fix commit 给我;否则视作通过。Gate G1 还差 Q4 你点头。 |
3 tasks
WW-shan
pushed a commit
that referenced
this pull request
May 12, 2026
Adds DS task pack #3 for the T2 resolution criteria reader, including OpenRouter chat-mode wiring, verbatim grounding, cost guardrails, and mocked test requirements.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR is
A condensed decision sheet for the 8 open questions in
docs/plans/2026-05-11-longtail-thesis-open-questions.md. Instead of writing long answers, each Q now has 建议 / 替代 / Decision triplets so we can resolve via short replies.Key adjustment
Original Gate G1 said "4 reviewers acknowledge". Team is 2 people. G1 becomes "both acknowledge". Q4 (labeling) is the most affected — sample size dropped from 100 rules to 50 because of how double-labeling math works with 2 reviewers (2 × 50 = 100 labels = 50 rules × 2).
Q1 stays open
Q1 (tier thresholds) is the only Q that genuinely needs data, not opinion. Recommendation: wait on it until DS task pack #2 (Gamma distribution reality-check) runs and gives us actual percentiles. The other 7 can be resolved by reading this PR.
How to review
Open
docs/plans/2026-05-12-q9-quick-decisions.md. For each Q:Decision: 建议to fill the placeholderDecision: 替代I'll sync decisions back to the main plan (v1.0 bump) once all 7 are filled.
Test plan