You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Calibrate cost estimates from experiment 2 measurements
n=5 real Polymarket descriptions, Gemini 2.0 Flash V2 strict prompt:
- 5/5 schema_ok, 5/5 grounding_ok
- avg input 552 tok, avg output 397 tok
- actual: $0.000214/call (vs $0.000090 prior estimate, 2.4× higher)
- output is bigger than predicted because verbatim_text transcript
inflates it ~2.6× over the "short JSON" assumption
Total run revised: $0.52 → $0.87. Still trivial.
Also drops the head-to-head budget line — V2 strict prompt with
verbatim grounding nailed 5/5 on first try, so no prompt-tuning
phase needed before scale-up. Prior $0.02 line removed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0 commit comments