feat: Per-run cost tracking, phase-level metering, and budget caps#907
feat: Per-run cost tracking, phase-level metering, and budget caps#907ryaneggz wants to merge 3 commits intodevelopmentfrom
Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ryaneggz <kre8mymedia@gmail.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…ddleware Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ryaneggz <kre8mymedia@gmail.com>
ryaneggz
left a comment
There was a problem hiding this comment.
Code Review — feat/cost-tracking
Reviewed files:
backend/src/constants/pricing.pybackend/src/constants/phases.pybackend/src/schemas/entities/metrics.pybackend/src/utils/middleware.py(cost_tracking_middleware + registration)
Critical / High Priority Issues
[HIGH] Deprecated datetime.utcnow in TurnMetrics schema
Location: backend/src/schemas/entities/metrics.py:21
# Current (deprecated in Python 3.12, scheduled for removal)
timestamp: datetime = Field(default_factory=datetime.utcnow)
# Fix: use timezone-aware UTC
from datetime import datetime, timezone
timestamp: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))datetime.utcnow() returns a naive datetime with no timezone info and raises DeprecationWarning as of Python 3.12. The fix must produce a timezone-aware object; Pydantic serialization and downstream consumers (DB, API responses) will behave correctly with the aware form. No other file in backend/src/schemas/ uses utcnow.
[HIGH] Model name extraction in cost_tracking_middleware is unreliable
Location: backend/src/utils/middleware.py:159-166
# Current
model = getattr(request, "model", "unknown")
if hasattr(request, "state") and "model" in request.state:
model = request.state["model"]Two problems:
-
request.modelis set bydynamic_model_selectionmiddleware (line 235) as the return value ofinit_chat_model(model)— aBaseChatModelobject, not a string. Passing that object toestimate_costwill miss every pricing table lookup and silently return0.0. -
request.state["model"]is the LangGraph agent state dict which carries message history, not the selected model string. The actual model identifier lives in the AI message'sresponse_metadata["model_name"](visible in the mock constant atbackend/src/constants/mock.py). The correct extraction pattern, consistent with howcache_metrics_middlewarealready handles it, is:
model_name = (
ai_msg.response_metadata.get("model_name", "unknown")
if isinstance(ai_msg.response_metadata, dict)
else "unknown"
)
cost = estimate_cost(model_name, input_tokens, output_tokens)Without this fix, cost will be 0.0 for every call where dynamic model selection is active (which is the common path), making the middleware produce systematically wrong data.
[MEDIUM] Middleware ordering: cost_tracking placed before retry_model
Location: backend/src/utils/middleware.py:381-387
stack = [
compaction_middleware,
add_ai_message_metadata,
cache_metrics_middleware,
cost_tracking_middleware, # <-- here
retry_model, # <-- retry wraps everything below it
...
]In this middleware stack, each entry wraps the chain below it. retry_model currently sits after cost_tracking_middleware, meaning retried calls bypass cost accounting entirely — only the final successful attempt is measured. If the intent is to capture cost per LLM attempt (including retries), cost_tracking_middleware should be placed after retry_model. If the intent is to capture only successful calls, the current position is correct but the docstring and plan should say so explicitly. The existing cache_metrics_middleware shares this position, but for caching it doesn't matter; for billing-relevant cost data it does.
Recommended order if per-attempt capture is desired:
stack = [
compaction_middleware,
add_ai_message_metadata,
cache_metrics_middleware,
retry_model,
cost_tracking_middleware, # after retry, captures each individual attempt
...
][MEDIUM] No persistence — middleware only logs, does not write TurnMetrics
Location: backend/src/utils/middleware.py (cost_tracking_middleware body)
The spec and plan both require a TurnMetrics record to be written to the database per call ("Persist TurnMetrics asynchronously (fire-and-forget DB write)"). The current implementation only emits a logger.info line. The TurnMetrics schema exists in backend/src/schemas/entities/metrics.py but is never instantiated or persisted. Phase 1 acceptance criterion states "Every LLM call produces a TurnMetrics record in the database," which this does not satisfy.
This is not a blocker for merging if this is treated as Phase 1 groundwork only, but the PR description should be explicit that persistence is deferred to a follow-up.
Code Quality Issues
[MEDIUM] TurnMetrics.agent_phase uses a plain string instead of the AgentPhase enum
Location: backend/src/schemas/entities/metrics.py:13 and backend/src/constants/phases.py
The branch introduces AgentPhase enum in constants/phases.py but TurnMetrics declares agent_phase: str = "solo". This bypasses the enum entirely, making it possible to persist invalid phase values. Use the enum:
from src.constants.phases import AgentPhase
agent_phase: AgentPhase = AgentPhase.SOLO[MEDIUM] Pricing table missing models that are live in constants/llm.py
Location: backend/src/constants/pricing.py
The following model IDs are registered in ChatModels / llm.py but absent from PRICING_TABLE, meaning any call using them returns 0.0 cost silently:
google_genai:gemini-flash-lite-latestgoogle_genai:gemini-3-flash-previewgroq:openai/gpt-oss-120b
At minimum, a TODO comment should mark these as pending pricing. Alternatively, log a warning when estimate_cost is called with an unrecognised model instead of silently returning zero:
def estimate_cost(model: str, input_tokens: int, output_tokens: int) -> float:
pricing = PRICING_TABLE.get(model)
if pricing is None:
logger.warning(f"estimate_cost: unknown model '{model}', returning 0.0")
return 0.0
...[MEDIUM] TurnMetrics and ThreadCostSummary use float for monetary values
Location: backend/src/schemas/entities/metrics.py:19,25
The spec calls for Decimal for input_cost, output_cost, total_cost. Using float introduces IEEE 754 rounding errors that accumulate when summing many turns. This matters more for ThreadCostSummary.total_cost_usd (aggregation) than for a single turn. For a billing-adjacent feature this warrants Decimal or at least Annotated[float, Field(ge=0.0)] with explicit rounding discipline documented.
[LOW] cost_tracking_middleware duplicates token-extraction logic already in cache_metrics_middleware
Location: backend/src/utils/middleware.py:131-178
Both middlewares independently walk response.messages in reverse and call getattr(ai_msg, "usage_metadata", None). This is 15 lines of identical code. Extract to a shared helper:
def _extract_usage(response: ModelResponse) -> tuple[AIMessage | None, object | None]:
"""Return (ai_msg, usage_metadata) or (None, None) if absent."""
for msg in reversed(getattr(response, "messages", []) or []):
if isinstance(msg, AIMessage):
return msg, getattr(msg, "usage_metadata", None)
return None, None[LOW] estimate_cost return type and precision are not in the function signature context
Location: backend/src/constants/pricing.py:43-50
The function is well-typed and has a docstring, which is good. Minor: rounding to 6 decimal places (round(..., 6)) is reasonable but should be noted in the docstring so callers know not to expect full float precision.
[LOW] RunBudget schema is missing id, thread_id, assistant_id fields
Location: backend/src/schemas/entities/metrics.py:37-40
The spec defines RunBudget as having thread_id or assistant_id (at least one required) plus an id for CRUD operations. The current schema contains only max_cost_usd, max_duration_minutes, and action_on_exceed. This is an incomplete implementation, but acceptable if persistence is deferred.
Process / Commit Hygiene
[HIGH] Commits are not GPG-signed
Location: All commits on this branch (590eb526, 8e752fd0, etc.)
git log --format="%H %GS %G?" shows N (no signature) for all branch-specific commits. Project guidelines in CLAUDE.md require git commit -s (DCO sign-off). The feature commit 590eb526 does include Signed-off-by: ryaneggz in the message body, which satisfies DCO, but the %G? = N indicates no GPG key signature. If GPG signing is enforced by CI/branch protection, these commits will be rejected. Please verify the repo's signing requirement and re-sign if needed.
There is no evidence of --no-verify being used (the pre-commit hooks appear to have run based on the clean commit message format), which is good.
Tests
No unit or integration tests are included for:
estimate_cost(pricing math, zero tokens, unknown model, negative token guard)cost_tracking_middleware(mock ModelResponse with and without usage_metadata)TurnMetricsschema validation
The plan's Phase 6 calls for >= 80% coverage on new modules. These tests should be in backend/tests/unit/ before this is marked ready for merge.
Summary
Must fix before merge:
- Replace
datetime.utcnowwithdatetime.now(timezone.utc)inTurnMetrics. - Fix model name extraction in
cost_tracking_middlewareto useai_msg.response_metadata["model_name"]— the current logic will silently produce$0.00cost on every call.
Should fix before merge:
3. Add a logger.warning for unknown models in estimate_cost so missing pricing entries surface instead of silently returning zero.
4. Use AgentPhase enum on TurnMetrics.agent_phase instead of bare string.
5. Clarify middleware ordering intent relative to retry_model in a comment.
6. Add at least basic unit tests for estimate_cost and schema validation.
Acceptable to defer:
- DB persistence (document in PR that this is a logging-only Phase 1 stub)
RunBudgetfield completeness (document as stub)floatvsDecimalfor monetary fields (document known limitation)- Deduplication of token-extraction helper
The pricing table model coverage and math look correct for the models that are present. The middleware structure and decorator pattern correctly follow the cache_metrics_middleware precedent. Good foundation — the critical bug is the model name extraction.
…middleware ordering, pricing gaps Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ryaneggz <kre8mymedia@gmail.com>
Summary
Closes #905
Backend implementation for per-run cost tracking with pricing tables, metrics schemas, and cost capture middleware.
Files Changed
backend/src/constants/phases.py— new: sharedAgentPhaseenumbackend/src/constants/pricing.py— new:PRICING_TABLE(23+ models),estimate_cost()functionbackend/src/schemas/entities/metrics.py— new:TurnMetrics,ThreadCostSummary,RunBudgetschemasbackend/src/utils/middleware.py— modified: addedcost_tracking_middleware, registered afterretry_modelSpec & Plan
.claude/specs/cost-tracking.md.claude/plans/cost-tracking.mdHuman Review Checklist
PRICING_TABLEprices are accurate for each model (cross-reference provider pricing pages)ChatModelsenum have corresponding pricing entriesestimate_cost()logs debug warning for unknown models (not silent$0.00)TurnMetrics.timestampusesdatetime.now(timezone.utc)(not deprecateddatetime.utcnow())TurnMetrics.agent_phaseusesAgentPhase.SOLOenum (not hardcoded string)cost_tracking_middlewareextracts model name fromai_msg.response_metadata["model_name"](notrequest.model)cost_tracking_middlewareis AFTERretry_modelininit_default_middleware()cache_metrics_middleware(token extraction, null checks)cd backend && make format && make lint— should passcd backend && make test— should pass (requires Postgres)Test Plan
estimate_cost("anthropic:claude-sonnet-4", 1000, 500)returns correct USD valueestimate_cost("unknown:model", 1000, 500)returns0.0and logs debugTurnMetricsschema validates with all required fieldsRunBudgetschema validates withaction_on_exceedenum valuesretry_model)🤖 Generated with Claude Code