feat: expose cumulative token usage on success + budget enforcement#2392
feat: expose cumulative token usage on success + budget enforcement#2392mimran-khan wants to merge 2 commits into
Conversation
Previously, total_usage (accumulated token counts across retry attempts) was only available when retries FAILED via InstructorRetryException. On successful extraction, this data was computed but discarded. Changes: - Attach _total_usage to successful parsed responses (like _raw_response) - Add completion:usage hook that fires after each attempt with cumulative token counts, enabling observability integration - Add token_budget parameter to create() that raises TokenBudgetExceeded if cumulative tokens across all attempts exceed the configured limit - TokenBudgetExceeded exception includes usage data, budget, and attempt count This prevents runaway retry costs in production and gives users visibility into how many tokens their retries actually consume when things work. Refs 567-labs#2391
|
Reviewed for merge readiness. Directionally this is the right PR to keep for #2391 / the #2056 retry-cost thread, and I closed the older conflicted #2296 in favor of this one. Before merge, I would like this refreshed with CI and one follow-up check: make sure cumulative usage is exposed consistently for non-BaseModel successful response shapes too, especially |
Addresses reviewer feedback: - _total_usage is now attached to ListResponse results (list[Model]), not just single BaseModel responses - Added 5 async tests mirroring the sync coverage (usage attachment, hook firing, budget enforcement, under-budget success, list response) - Added 3 ListResponse-specific tests (sync) for budget enforcement and usage attachment on list results AdapterBase (primitive return types like str/int) cannot carry attributes, so _total_usage is only available on BaseModel and ListResponse shapes.
|
Thanks for the review and for closing #2296 in favor of this. Pushed an update addressing both points: ListResponse coverage: Async coverage: Added 5 async tests matching the sync suite - usage attachment, hook firing, budget exceeded, under-budget success, and list response attachment. All use Note on AdapterBase: For primitive return types (like 14 tests total now, all passing. |
Fixes #2391. Related to #2056.
Problem
The retry system tracks cumulative token usage across all attempts, but this data is only accessible when retries fail (via
InstructorRetryException.total_usage). On success, it's computed and thrown away. Users have no visibility into how expensive a successful extraction actually was, and no way to cap runaway costs from complex schemas that trigger many retries.This came up in the #2056 discussion where someone mentioned losing "hundreds of dollars a day" from retries they couldn't observe or control.
What this PR does
Three things, all backward-compatible:
1.
_total_usageon successful responsesAfter a successful extraction, the parsed model now has
_total_usageattached (same pattern as_raw_response):2.
completion:usagehookNew hook that fires after each API attempt with the running total. Enables integration with metrics/observability without touching core logic:
3.
token_budgetparameterOptional parameter that raises
TokenBudgetExceededif cumulative tokens exceed the limit:Files changed
instructor/v2/core/hooks.py- addedCOMPLETION_USAGEhook name +CompletionUsageHandlerprotocol +emit_completion_usage()methodinstructor/v2/core/errors.py- addedTokenBudgetExceededexceptioninstructor/v2/core/retry.py- emit usage hook, check budget, attach_total_usageon success, letTokenBudgetExceededescape the retry wrapperinstructor/v2/core/patch.py- passtoken_budgetfromcreate()down to retry logictests/test_token_budget.py- 6 tests covering all new behaviorBackward compatibility
All changes are additive. Existing code is unaffected:
token_budgetdefaults toNone(no enforcement)_total_usageis only attached, never requiredChecklist before requesting a review