Per-turn metrics (model, tokens, cost, duration, phase), aggregate metrics per thread/phase, provider pricing tables (configurable), real-time cost display in frontend, budget cap per thread/assistant with pause-on-exceed, historical cost data for analytics.
Users and operators need visibility into the cost of every AI interaction. Without cost tracking, teams cannot forecast budgets, identify expensive workflows, or enforce spending limits. This feature adds full-stack cost observability -- from token-level capture to real-time UI display -- plus budget enforcement that integrates with the existing HITL (human-in-the-loop) interrupt system.
-
TurnMetrics -- Per-turn record capturing:
turn_id,thread_id,assistant_id,user_idmodel(string, e.g.gpt-4o,claude-sonnet-4-20250514)input_tokens,output_tokens,total_tokensinput_cost,output_cost,total_cost(Decimal, USD)duration_ms(int, wall-clock time of LLM call)phase(enum:planning,execution,reflection,tool_call)created_attimestamp
-
ThreadCostSummary -- Aggregate view:
thread_idtotal_cost,total_tokens,total_duration_msturn_count- Per-phase breakdown (dict keyed by phase enum)
- Per-model breakdown (dict keyed by model string)
-
RunBudget -- Budget configuration:
thread_idorassistant_id(nullable, at least one required)max_cost_usd(Decimal)max_tokens(int, optional)action_on_exceed(enum:pause,warn,stop)is_active(bool)
- Dictionary of model ->
{ input_per_1k, output_per_1k, currency }pricing. - Covers OpenAI, Anthropic, Google, and open-source model families.
- Admin-configurable via API (CRUD endpoints) with DB-backed overrides.
- Fallback to hardcoded defaults when no DB override exists.
- Wraps LLM invocations to capture:
- Token counts from response metadata (
usagefield) - Wall-clock duration via
time.perf_counter() - Model identifier from the request config
- Token counts from response metadata (
- Computes cost = tokens * pricing rate
- Persists
TurnMetricsrecord asynchronously (fire-and-forget DB write) - Emits metrics event for real-time streaming to frontend
- Before each LLM call, check cumulative thread cost against
RunBudget. - If
max_cost_usdexceeded:pause-- Trigger HITL interrupt, let user approve continuation.warn-- Log warning and continue, send UI notification.stop-- Halt execution, return budget-exceeded error.
- Budget checks are lightweight (cached running total, refreshed on each turn).
| Method | Path | Description |
|---|---|---|
| GET | /v0/threads/{id}/metrics |
Thread cost summary |
| GET | /v0/threads/{id}/metrics/turns |
Per-turn breakdown |
| GET | /v0/pricing |
List all pricing entries |
| POST | /v0/pricing |
Create/update pricing entry |
| DELETE | /v0/pricing/{model} |
Delete pricing override |
| GET | /v0/budgets |
List budgets for user |
| POST | /v0/budgets |
Create/update budget |
| DELETE | /v0/budgets/{id} |
Delete budget |
- Running cost badge -- Small pill in the chat header showing cumulative thread cost (e.g.
$0.12), updated in real-time via SSE/streaming events. - Cost breakdown panel -- Expandable side panel or modal showing per-turn and per-phase cost tables with charts.
- Budget configuration -- Settings UI to set
max_cost_usdper thread or per assistant, choose action-on-exceed behavior. - Warning indicators -- Yellow/red badge color when approaching/exceeding budget thresholds (75%/100%).
- New
turn_metricstable (Alembic migration):- Columns matching
TurnMetricsschema - Indexes on
thread_id,created_at,model - Foreign key to threads table
- Columns matching
- New
run_budgetstable:- Columns matching
RunBudgetschema - Unique constraint on (
thread_id,assistant_id)
- Columns matching
- Optional
pricing_overridestable for admin-managed pricing.
- Unit tests: Pricing calculation, budget threshold logic, metrics aggregation.
- Integration tests: End-to-end LLM call with metrics capture, budget enforcement with HITL interrupt, API endpoint responses.
- Frontend tests: Cost badge rendering, budget config form validation.
- Multi-currency support (USD only for now)
- Organization-level billing aggregation
- Automated payment/invoicing integration
- Cost prediction/forecasting ML models