Skip to content

feat: Per-run cost tracking, phase-level metering, and budget caps#907

Draft
ryaneggz wants to merge 3 commits intodevelopmentfrom
feat/cost-tracking
Draft

feat: Per-run cost tracking, phase-level metering, and budget caps#907
ryaneggz wants to merge 3 commits intodevelopmentfrom
feat/cost-tracking

Conversation

@ryaneggz
Copy link
Copy Markdown
Collaborator

@ryaneggz ryaneggz commented Mar 25, 2026

Summary

Closes #905

Backend implementation for per-run cost tracking with pricing tables, metrics schemas, and cost capture middleware.

Files Changed

  • backend/src/constants/phases.py — new: shared AgentPhase enum
  • backend/src/constants/pricing.py — new: PRICING_TABLE (23+ models), estimate_cost() function
  • backend/src/schemas/entities/metrics.py — new: TurnMetrics, ThreadCostSummary, RunBudget schemas
  • backend/src/utils/middleware.py — modified: added cost_tracking_middleware, registered after retry_model

Spec & Plan

Human Review Checklist

  • Verify PRICING_TABLE prices are accurate for each model (cross-reference provider pricing pages)
  • Verify all models in ChatModels enum have corresponding pricing entries
  • Verify estimate_cost() logs debug warning for unknown models (not silent $0.00)
  • Verify TurnMetrics.timestamp uses datetime.now(timezone.utc) (not deprecated datetime.utcnow())
  • Verify TurnMetrics.agent_phase uses AgentPhase.SOLO enum (not hardcoded string)
  • Verify cost_tracking_middleware extracts model name from ai_msg.response_metadata["model_name"] (not request.model)
  • Verify middleware ordering: cost_tracking_middleware is AFTER retry_model in init_default_middleware()
  • Verify middleware follows same pattern as existing cache_metrics_middleware (token extraction, null checks)
  • Run cd backend && make format && make lint — should pass
  • Run cd backend && make test — should pass (requires Postgres)

Test Plan

  • estimate_cost("anthropic:claude-sonnet-4", 1000, 500) returns correct USD value
  • estimate_cost("unknown:model", 1000, 500) returns 0.0 and logs debug
  • TurnMetrics schema validates with all required fields
  • RunBudget schema validates with action_on_exceed enum values
  • Cost tracking middleware captures tokens and cost on successful LLM calls
  • Middleware does not meter failed/retried calls (positioned after retry_model)

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: ryaneggz <kre8mymedia@gmail.com>
@ryaneggz ryaneggz added the enhancement New feature or request label Mar 25, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 25, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e337e971-d8b2-421e-b64a-ca3d9b9fde2d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/cost-tracking

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…ddleware

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: ryaneggz <kre8mymedia@gmail.com>
Copy link
Copy Markdown
Collaborator Author

@ryaneggz ryaneggz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — feat/cost-tracking

Reviewed files:

  • backend/src/constants/pricing.py
  • backend/src/constants/phases.py
  • backend/src/schemas/entities/metrics.py
  • backend/src/utils/middleware.py (cost_tracking_middleware + registration)

Critical / High Priority Issues

[HIGH] Deprecated datetime.utcnow in TurnMetrics schema

Location: backend/src/schemas/entities/metrics.py:21

# Current (deprecated in Python 3.12, scheduled for removal)
timestamp: datetime = Field(default_factory=datetime.utcnow)

# Fix: use timezone-aware UTC
from datetime import datetime, timezone
timestamp: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))

datetime.utcnow() returns a naive datetime with no timezone info and raises DeprecationWarning as of Python 3.12. The fix must produce a timezone-aware object; Pydantic serialization and downstream consumers (DB, API responses) will behave correctly with the aware form. No other file in backend/src/schemas/ uses utcnow.


[HIGH] Model name extraction in cost_tracking_middleware is unreliable

Location: backend/src/utils/middleware.py:159-166

# Current
model = getattr(request, "model", "unknown")
if hasattr(request, "state") and "model" in request.state:
    model = request.state["model"]

Two problems:

  1. request.model is set by dynamic_model_selection middleware (line 235) as the return value of init_chat_model(model) — a BaseChatModel object, not a string. Passing that object to estimate_cost will miss every pricing table lookup and silently return 0.0.

  2. request.state["model"] is the LangGraph agent state dict which carries message history, not the selected model string. The actual model identifier lives in the AI message's response_metadata["model_name"] (visible in the mock constant at backend/src/constants/mock.py). The correct extraction pattern, consistent with how cache_metrics_middleware already handles it, is:

model_name = (
    ai_msg.response_metadata.get("model_name", "unknown")
    if isinstance(ai_msg.response_metadata, dict)
    else "unknown"
)
cost = estimate_cost(model_name, input_tokens, output_tokens)

Without this fix, cost will be 0.0 for every call where dynamic model selection is active (which is the common path), making the middleware produce systematically wrong data.


[MEDIUM] Middleware ordering: cost_tracking placed before retry_model

Location: backend/src/utils/middleware.py:381-387

stack = [
    compaction_middleware,
    add_ai_message_metadata,
    cache_metrics_middleware,
    cost_tracking_middleware,   # <-- here
    retry_model,                # <-- retry wraps everything below it
    ...
]

In this middleware stack, each entry wraps the chain below it. retry_model currently sits after cost_tracking_middleware, meaning retried calls bypass cost accounting entirely — only the final successful attempt is measured. If the intent is to capture cost per LLM attempt (including retries), cost_tracking_middleware should be placed after retry_model. If the intent is to capture only successful calls, the current position is correct but the docstring and plan should say so explicitly. The existing cache_metrics_middleware shares this position, but for caching it doesn't matter; for billing-relevant cost data it does.

Recommended order if per-attempt capture is desired:

stack = [
    compaction_middleware,
    add_ai_message_metadata,
    cache_metrics_middleware,
    retry_model,
    cost_tracking_middleware,   # after retry, captures each individual attempt
    ...
]

[MEDIUM] No persistence — middleware only logs, does not write TurnMetrics

Location: backend/src/utils/middleware.py (cost_tracking_middleware body)

The spec and plan both require a TurnMetrics record to be written to the database per call ("Persist TurnMetrics asynchronously (fire-and-forget DB write)"). The current implementation only emits a logger.info line. The TurnMetrics schema exists in backend/src/schemas/entities/metrics.py but is never instantiated or persisted. Phase 1 acceptance criterion states "Every LLM call produces a TurnMetrics record in the database," which this does not satisfy.

This is not a blocker for merging if this is treated as Phase 1 groundwork only, but the PR description should be explicit that persistence is deferred to a follow-up.


Code Quality Issues

[MEDIUM] TurnMetrics.agent_phase uses a plain string instead of the AgentPhase enum

Location: backend/src/schemas/entities/metrics.py:13 and backend/src/constants/phases.py

The branch introduces AgentPhase enum in constants/phases.py but TurnMetrics declares agent_phase: str = "solo". This bypasses the enum entirely, making it possible to persist invalid phase values. Use the enum:

from src.constants.phases import AgentPhase

agent_phase: AgentPhase = AgentPhase.SOLO

[MEDIUM] Pricing table missing models that are live in constants/llm.py

Location: backend/src/constants/pricing.py

The following model IDs are registered in ChatModels / llm.py but absent from PRICING_TABLE, meaning any call using them returns 0.0 cost silently:

  • google_genai:gemini-flash-lite-latest
  • google_genai:gemini-3-flash-preview
  • groq:openai/gpt-oss-120b

At minimum, a TODO comment should mark these as pending pricing. Alternatively, log a warning when estimate_cost is called with an unrecognised model instead of silently returning zero:

def estimate_cost(model: str, input_tokens: int, output_tokens: int) -> float:
    pricing = PRICING_TABLE.get(model)
    if pricing is None:
        logger.warning(f"estimate_cost: unknown model '{model}', returning 0.0")
        return 0.0
    ...

[MEDIUM] TurnMetrics and ThreadCostSummary use float for monetary values

Location: backend/src/schemas/entities/metrics.py:19,25

The spec calls for Decimal for input_cost, output_cost, total_cost. Using float introduces IEEE 754 rounding errors that accumulate when summing many turns. This matters more for ThreadCostSummary.total_cost_usd (aggregation) than for a single turn. For a billing-adjacent feature this warrants Decimal or at least Annotated[float, Field(ge=0.0)] with explicit rounding discipline documented.


[LOW] cost_tracking_middleware duplicates token-extraction logic already in cache_metrics_middleware

Location: backend/src/utils/middleware.py:131-178

Both middlewares independently walk response.messages in reverse and call getattr(ai_msg, "usage_metadata", None). This is 15 lines of identical code. Extract to a shared helper:

def _extract_usage(response: ModelResponse) -> tuple[AIMessage | None, object | None]:
    """Return (ai_msg, usage_metadata) or (None, None) if absent."""
    for msg in reversed(getattr(response, "messages", []) or []):
        if isinstance(msg, AIMessage):
            return msg, getattr(msg, "usage_metadata", None)
    return None, None

[LOW] estimate_cost return type and precision are not in the function signature context

Location: backend/src/constants/pricing.py:43-50

The function is well-typed and has a docstring, which is good. Minor: rounding to 6 decimal places (round(..., 6)) is reasonable but should be noted in the docstring so callers know not to expect full float precision.


[LOW] RunBudget schema is missing id, thread_id, assistant_id fields

Location: backend/src/schemas/entities/metrics.py:37-40

The spec defines RunBudget as having thread_id or assistant_id (at least one required) plus an id for CRUD operations. The current schema contains only max_cost_usd, max_duration_minutes, and action_on_exceed. This is an incomplete implementation, but acceptable if persistence is deferred.


Process / Commit Hygiene

[HIGH] Commits are not GPG-signed

Location: All commits on this branch (590eb526, 8e752fd0, etc.)

git log --format="%H %GS %G?" shows N (no signature) for all branch-specific commits. Project guidelines in CLAUDE.md require git commit -s (DCO sign-off). The feature commit 590eb526 does include Signed-off-by: ryaneggz in the message body, which satisfies DCO, but the %G? = N indicates no GPG key signature. If GPG signing is enforced by CI/branch protection, these commits will be rejected. Please verify the repo's signing requirement and re-sign if needed.

There is no evidence of --no-verify being used (the pre-commit hooks appear to have run based on the clean commit message format), which is good.


Tests

No unit or integration tests are included for:

  • estimate_cost (pricing math, zero tokens, unknown model, negative token guard)
  • cost_tracking_middleware (mock ModelResponse with and without usage_metadata)
  • TurnMetrics schema validation

The plan's Phase 6 calls for >= 80% coverage on new modules. These tests should be in backend/tests/unit/ before this is marked ready for merge.


Summary

Must fix before merge:

  1. Replace datetime.utcnow with datetime.now(timezone.utc) in TurnMetrics.
  2. Fix model name extraction in cost_tracking_middleware to use ai_msg.response_metadata["model_name"] — the current logic will silently produce $0.00 cost on every call.

Should fix before merge:
3. Add a logger.warning for unknown models in estimate_cost so missing pricing entries surface instead of silently returning zero.
4. Use AgentPhase enum on TurnMetrics.agent_phase instead of bare string.
5. Clarify middleware ordering intent relative to retry_model in a comment.
6. Add at least basic unit tests for estimate_cost and schema validation.

Acceptable to defer:

  • DB persistence (document in PR that this is a logging-only Phase 1 stub)
  • RunBudget field completeness (document as stub)
  • float vs Decimal for monetary fields (document known limitation)
  • Deduplication of token-extraction helper

The pricing table model coverage and math look correct for the models that are present. The middleware structure and decorator pattern correctly follow the cache_metrics_middleware precedent. Good foundation — the critical bug is the model name extraction.

…middleware ordering, pricing gaps

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: ryaneggz <kre8mymedia@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Per-run cost tracking, phase-level metering, and budget caps

1 participant