Skip to content

feat: v0.21.0 — token budget tracking and context window early warning#37

Merged
Siddhant-K-code merged 1 commit into
mainfrom
feat/v0.21.0-token-budget
Apr 11, 2026
Merged

feat: v0.21.0 — token budget tracking and context window early warning#37
Siddhant-K-code merged 1 commit into
mainfrom
feat/v0.21.0-token-budget

Conversation

@Siddhant-K-code
Copy link
Copy Markdown
Owner

Closes #27

What

Adds token_budget.py to track cumulative input token usage against the model's context window limit and warn before exhaustion.

Context limits

Built-in limits for Anthropic (claude-*), OpenAI (gpt-4o, o1, o3), Google (gemini-1.5/2.0), and Meta (llama-3.1) models. Prefix matching handles versioned model names like claude-sonnet-4-20251022.

analyse_token_budget()

Per-request accumulation table:

Session: a1b2c3d4e5f6  Model: claude-sonnet-4  Context limit: 200,000 tokens
────────────────────────────────────────────────────────────────────────
  Req   Offset   Input tok  Output tok   Cumulative   % limit
────────────────────────────────────────────────────────────────────────
    1    +0:00      12,400       1,200       12,400     6.2%
    2    +1:34      18,600       2,100       31,000    15.5%
    3    +3:12     145,000       3,400      176,000    88.0%  ← warning
────────────────────────────────────────────────────────────────────────
Current: 176,000 tokens  ( 88.0% of limit)
Est. remaining: ~0 request(s)

TokenBudgetWatcher

Stateful watcher integrated into watch.py. Fires once when cumulative input tokens cross the threshold. Configurable via max_context_pct in .agent-watch.json or --max-context-pct CLI flag.

CLI

agent-strace token-budget [session-id] [--warning-threshold 0.9]

Tests

tests/test_token_budget.py — 13 tests.

Adds token_budget.py with:
- CONTEXT_LIMITS dict covering Anthropic, OpenAI, Google, Meta models
- analyse_token_budget() — per-request accumulation table with % of limit
- TokenBudgetWatcher — stateful watcher used by watch mode, fires once
  when cumulative input tokens cross a configurable threshold (default 90%)

watch.py integrates TokenBudgetWatcher; threshold configurable via
max_context_pct in .agent-watch.json or --max-context-pct CLI flag.

CLI: agent-strace token-budget [session-id] [--warning-threshold 0.9]

Closes #27

Co-authored-by: Ona <no-reply@ona.com>
@Siddhant-K-code Siddhant-K-code merged commit 5ac2eda into main Apr 11, 2026
4 checks passed
@Siddhant-K-code Siddhant-K-code deleted the feat/v0.21.0-token-budget branch April 11, 2026 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

v0.21.0: Token budget tracking and early warning before context window exhaustion

1 participant