feat: expose cumulative token usage on success + budget enforcement by mimran-khan · Pull Request #2392 · 567-labs/instructor

mimran-khan · 2026-06-24T19:36:29Z

Fixes #2391. Related to #2056.

Problem

The retry system tracks cumulative token usage across all attempts, but this data is only accessible when retries fail (via InstructorRetryException.total_usage). On success, it's computed and thrown away. Users have no visibility into how expensive a successful extraction actually was, and no way to cap runaway costs from complex schemas that trigger many retries.

This came up in the #2056 discussion where someone mentioned losing "hundreds of dollars a day" from retries they couldn't observe or control.

What this PR does

Three things, all backward-compatible:

1. _total_usage on successful responses

After a successful extraction, the parsed model now has _total_usage attached (same pattern as _raw_response):

user = client.chat.completions.create(
    model="gpt-4o",
    response_model=User,
    messages=[...],
    max_retries=5,
)
print(f"Total tokens across all retries: {user._total_usage.total_tokens}")

2. completion:usage hook

New hook that fires after each API attempt with the running total. Enables integration with metrics/observability without touching core logic:

def on_usage(usage, *, attempt_number=0):
    metrics.gauge("instructor.tokens.cumulative", usage.total_tokens)

client.on("completion:usage", on_usage)

3. token_budget parameter

Optional parameter that raises TokenBudgetExceeded if cumulative tokens exceed the limit:

from instructor.v2.core.errors import TokenBudgetExceeded

try:
    user = client.chat.completions.create(
        response_model=ComplexSchema,
        max_retries=10,
        token_budget=5000,  # hard cap across all attempts
        ...
    )
except TokenBudgetExceeded as e:
    print(f"Aborted: used {e.total_usage.total_tokens} tokens in {e.n_attempts} attempts")

Files changed

instructor/v2/core/hooks.py - added COMPLETION_USAGE hook name + CompletionUsageHandler protocol + emit_completion_usage() method
instructor/v2/core/errors.py - added TokenBudgetExceeded exception
instructor/v2/core/retry.py - emit usage hook, check budget, attach _total_usage on success, let TokenBudgetExceeded escape the retry wrapper
instructor/v2/core/patch.py - pass token_budget from create() down to retry logic
tests/test_token_budget.py - 6 tests covering all new behavior

Backward compatibility

All changes are additive. Existing code is unaffected:

token_budget defaults to None (no enforcement)
_total_usage is only attached, never required
The new hook only fires if handlers are registered

Checklist before requesting a review

I have performed a self-review of my code
If it is a core feature, I have added thorough tests.
If it is a core feature, I have added documentation.

Previously, total_usage (accumulated token counts across retry attempts) was only available when retries FAILED via InstructorRetryException. On successful extraction, this data was computed but discarded. Changes: - Attach _total_usage to successful parsed responses (like _raw_response) - Add completion:usage hook that fires after each attempt with cumulative token counts, enabling observability integration - Add token_budget parameter to create() that raises TokenBudgetExceeded if cumulative tokens across all attempts exceed the configured limit - TokenBudgetExceeded exception includes usage data, budget, and attempt count This prevents runaway retry costs in production and gives users visibility into how many tokens their retries actually consume when things work. Refs 567-labs#2391

jxnl · 2026-06-29T00:33:19Z

Reviewed for merge readiness. Directionally this is the right PR to keep for #2391 / the #2056 retry-cost thread, and I closed the older conflicted #2296 in favor of this one. Before merge, I would like this refreshed with CI and one follow-up check: make sure cumulative usage is exposed consistently for non-BaseModel successful response shapes too, especially list[Model] / ListResponse, or explicitly document that _total_usage is only attached to BaseModel responses. The hook and token_budget path should also have async coverage matching the sync tests.

Addresses reviewer feedback: - _total_usage is now attached to ListResponse results (list[Model]), not just single BaseModel responses - Added 5 async tests mirroring the sync coverage (usage attachment, hook firing, budget enforcement, under-budget success, list response) - Added 3 ListResponse-specific tests (sync) for budget enforcement and usage attachment on list results AdapterBase (primitive return types like str/int) cannot carry attributes, so _total_usage is only available on BaseModel and ListResponse shapes.

mimran-khan · 2026-06-29T14:23:05Z

Thanks for the review and for closing #2296 in favor of this. Pushed an update addressing both points:

ListResponse coverage: _total_usage is now attached to ListResponse results (the list[Model] path). The _finalize_parsed_response helper handles both the case where the parser returns a plain list (converted to ListResponse) and where it returns a ListResponse directly. Added 3 dedicated sync tests for this path.

Async coverage: Added 5 async tests matching the sync suite - usage attachment, hook firing, budget exceeded, under-budget success, and list response attachment. All use asyncio.run() with AsyncMock to exercise retry_async_v2 directly.

Note on AdapterBase: For primitive return types (like str, int via AdapterBase), Python doesn't allow attaching attributes to built-in types, so _total_usage is only available on BaseModel and ListResponse shapes. The hook and token_budget still work regardless of response shape since they operate before finalization.

14 tests total now, all passing.

jxnl mentioned this pull request Jun 29, 2026

feat(core): add token_budget parameter to limit retry token usage #2296

Closed

5 tasks

jxnl mentioned this pull request Jun 29, 2026

[codex] consolidate docs and audio validation fixes #2400

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: expose cumulative token usage on success + budget enforcement#2392

feat: expose cumulative token usage on success + budget enforcement#2392
mimran-khan wants to merge 2 commits into
567-labs:mainfrom
mimran-khan:feat/token-usage-budget

mimran-khan commented Jun 24, 2026 •

edited

Loading

Uh oh!

jxnl commented Jun 29, 2026

Uh oh!

mimran-khan commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

mimran-khan commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

What this PR does

Files changed

Backward compatibility

Checklist before requesting a review

Uh oh!

jxnl commented Jun 29, 2026

Uh oh!

mimran-khan commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mimran-khan commented Jun 24, 2026 •

edited

Loading