Skip to content

Commit 8d598a5

Browse files
张靖恒claude
andcommitted
Fix Qwen fallback recommendation in dash-ocr patterns memo
Per dash-ocr-pipeline's current production code (`structured_stage12.py`, `retry_silent_empty_checkpoint.py`), the silent-empty fallback uses the *same* Gemini 2.0 Flash model with a permissive V1 prompt, not a switch to Qwen 2.5-72B. The two-stage Gemini→Qwen pairer path was retired: single-call Gemini Flash F1 = 0.965 beat the Qwen pairer F1 = 0.950 in production validation 2026-05-06. Updated: - §3.2 budget: T2 V1 fallback uses Gemini Flash, not Qwen (saved ~$0.01) - §3.2 budget: T3 LLM verification uses Gemini Flash, not Qwen - §4 defaults: T2 fallback row corrected - §7 Q2 implication: simplified to single-model V2/V1 prompt pair Also notes that PR WW-shan#3 has already been updated with the corrected Q2 Decision (commit b299269) so the two docs now agree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent ba487af commit 8d598a5

1 file changed

Lines changed: 9 additions & 11 deletions

File tree

docs/references/dash-ocr-production-patterns.md

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -183,15 +183,15 @@ Three design choices that are load-bearing — flag before changing:
183183
| 步骤 | 调用次数 | 单价 | 小计 |
184184
|---|---|---|---|
185185
| T2 V2 主提取(Gemini Flash) | 2000 | $0.00009 | **$0.18** |
186-
| T2 V1 fallback(10% silent-empty,retry Qwen 2.5-72B| 200 | $0.00015 | **$0.03** |
187-
| T2 prompt tuning head-to-head(4 模型 × 20 markets) | 80 | 平均 $0.0003 | **$0.02** |
188-
| T3 embedding(OpenAI text-embedding-3-small 仍最划算| ~10M token | $0.00002/1k | **$0.20** |
189-
| T3 LLM 验证 candidates(Qwen 2.5-72B| 500 | $0.00015 | **$0.08** |
186+
| T2 V1 fallback(10% silent-empty,**同 Gemini Flash + V1 permissive prompt**| 200 | $0.00009 | **$0.02** |
187+
| T2 prompt tuning head-to-head(3 模型 × 20 markets,仅调优阶段| 60 | 平均 $0.0003 | **$0.02** |
188+
| T3 embedding(OpenAI text-embedding-3-small) | ~10M token | $0.00002/1k | **$0.20** |
189+
| T3 LLM 验证 candidates(Gemini 2.0 Flash| 500 | $0.00009 | **$0.05** |
190190
| T4 corpus(结构化派生,无 LLM 调用) | 0 || **$0** |
191191
| T4 judge ensemble(3 模型 × 100 cases) | 300 | 平均 $0.00015 | **$0.05** |
192-
| **合计单次完整跑** | | | **~$0.56** |
192+
| **合计单次完整跑** | | | **~$0.52** |
193193

194-
**vs 原 $17 估算**30× 便宜。
194+
**vs 原 $17 估算**33× 便宜。
195195

196196
每月跑 2 次 ≈ $1.20。零成本敏感度。
197197

@@ -203,7 +203,7 @@ Three design choices that are load-bearing — flag before changing:
203203
|---|---|---|
204204
| API gateway | OpenRouter | §0 |
205205
| T2 主模型 | `google/gemini-2.0-flash-001` | §3.1 性价比 |
206-
| T2 fallback 模型 | `qwen/qwen-2.5-72b-instruct` | dash-ocr head-to-head 赢家 |
206+
| T2 silent-empty fallback | **`google/gemini-2.0-flash-001` + V1 permissive prompt** | dash-ocr-pipeline `retry_silent_empty_checkpoint.py` 模式:同模型不同 prompt,而非换模型。pipeline 已弃用 Qwen(Gemini Flash 单 call F1 0.965 > Qwen pairer 0.950) |
207207
| 并发 workers | 1(先单线程,按需加) | §2.1 暂缓 |
208208
| MAX_COST_USD 默认 | $5 | 远超 $0.56 完整跑,留余量 |
209209
| HTTP 重试 | 3 次 exp backoff(2/4/8s)on 429/5xx | dash-ocr 同款 |
@@ -241,12 +241,10 @@ Three design choices that are load-bearing — flag before changing:
241241

242242
本文件**事实上回答了 §9 的 Q2 和 Q3**
243243

244-
- **Q2(T2 模型选择)**:默认 Gemini Flash via OpenRouterfallback Qwen 2.5-72B。不默认 Haiku。
244+
- **Q2(T2 模型选择)**:默认 `google/gemini-2.0-flash-001` via OpenRouter (V2 strict prompt)。silent-empty 时同模型 + V1 permissive prompt fallback**** Qwen——pipeline 最新版已弃用,Gemini Flash 单 call 实测优于多模型组合。****默认 Haiku。
245245
- **Q3(T3 embedding)**:OpenAI `text-embedding-3-small` 仍最划算(开源模型部署成本更高),决议不变。
246246

247-
**但本文件不更新 PR #3(§9 决议稿)**。等同学先看完原版 + 本备忘录,他可以选择:
248-
- 接受新方案 → 直接在 PR #3 push 一个 fix commit 改 Q2 Decision
249-
- 不接受 → 在 PR 上评论原因,我们讨论
247+
**已同步到 PR #3**:本文件起草后,作者已在 PR #3 push commit 把 Q2 Decision 填为本节决定,把 Q3 Decision 填为采纳原建议。详见 PR #3 (`prep/q9-quick-decisions`) commit b299269。
250248

251249
---
252250

0 commit comments

Comments
 (0)