feat: Per-run cost tracking, phase-level metering, and budget caps by ryaneggz · Pull Request #907 · ruska-ai/orchestra

ryaneggz · 2026-03-25T01:21:53Z

Summary

Closes #905

Backend implementation for per-run cost tracking with pricing tables, metrics schemas, and cost capture middleware.

Files Changed

backend/src/constants/phases.py — new: shared AgentPhase enum
backend/src/constants/pricing.py — new: PRICING_TABLE (23+ models), estimate_cost() function
backend/src/schemas/entities/metrics.py — new: TurnMetrics, ThreadCostSummary, RunBudget schemas
backend/src/utils/middleware.py — modified: added cost_tracking_middleware, registered after retry_model

Spec & Plan

Spec: .claude/specs/cost-tracking.md
Plan: .claude/plans/cost-tracking.md

Human Review Checklist

Test Plan

estimate_cost("anthropic:claude-sonnet-4", 1000, 500) returns correct USD value
estimate_cost("unknown:model", 1000, 500) returns 0.0 and logs debug
TurnMetrics schema validates with all required fields
RunBudget schema validates with action_on_exceed enum values
Cost tracking middleware captures tokens and cost on successful LLM calls
Middleware does not meter failed/retried calls (positioned after retry_model)

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ryaneggz <kre8mymedia@gmail.com>

coderabbitai · 2026-03-25T01:22:27Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e337e971-d8b2-421e-b64a-ca3d9b9fde2d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/cost-tracking

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…ddleware Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ryaneggz <kre8mymedia@gmail.com>

ryaneggz

Code Review — feat/cost-tracking

Reviewed files:

backend/src/constants/pricing.py
backend/src/constants/phases.py
backend/src/schemas/entities/metrics.py
backend/src/utils/middleware.py (cost_tracking_middleware + registration)

Critical / High Priority Issues

[HIGH] Deprecated `datetime.utcnow` in TurnMetrics schema

Location: backend/src/schemas/entities/metrics.py:21

# Current (deprecated in Python 3.12, scheduled for removal)
timestamp: datetime = Field(default_factory=datetime.utcnow)

# Fix: use timezone-aware UTC
from datetime import datetime, timezone
timestamp: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))

datetime.utcnow() returns a naive datetime with no timezone info and raises DeprecationWarning as of Python 3.12. The fix must produce a timezone-aware object; Pydantic serialization and downstream consumers (DB, API responses) will behave correctly with the aware form. No other file in backend/src/schemas/ uses utcnow.

[HIGH] Model name extraction in `cost_tracking_middleware` is unreliable

Location: backend/src/utils/middleware.py:159-166

# Current
model = getattr(request, "model", "unknown")
if hasattr(request, "state") and "model" in request.state:
    model = request.state["model"]

Two problems:

request.model is set by dynamic_model_selection middleware (line 235) as the return value of init_chat_model(model) — a BaseChatModel object, not a string. Passing that object to estimate_cost will miss every pricing table lookup and silently return 0.0.
request.state["model"] is the LangGraph agent state dict which carries message history, not the selected model string. The actual model identifier lives in the AI message's response_metadata["model_name"] (visible in the mock constant at backend/src/constants/mock.py). The correct extraction pattern, consistent with how cache_metrics_middleware already handles it, is:

model_name = (
    ai_msg.response_metadata.get("model_name", "unknown")
    if isinstance(ai_msg.response_metadata, dict)
    else "unknown"
)
cost = estimate_cost(model_name, input_tokens, output_tokens)

Without this fix, cost will be 0.0 for every call where dynamic model selection is active (which is the common path), making the middleware produce systematically wrong data.

[MEDIUM] Middleware ordering: cost_tracking placed before retry_model

Location: backend/src/utils/middleware.py:381-387

stack = [
    compaction_middleware,
    add_ai_message_metadata,
    cache_metrics_middleware,
    cost_tracking_middleware,   # <-- here
    retry_model,                # <-- retry wraps everything below it
    ...
]

In this middleware stack, each entry wraps the chain below it. retry_model currently sits after cost_tracking_middleware, meaning retried calls bypass cost accounting entirely — only the final successful attempt is measured. If the intent is to capture cost per LLM attempt (including retries), cost_tracking_middleware should be placed after retry_model. If the intent is to capture only successful calls, the current position is correct but the docstring and plan should say so explicitly. The existing cache_metrics_middleware shares this position, but for caching it doesn't matter; for billing-relevant cost data it does.

Recommended order if per-attempt capture is desired:

stack = [
    compaction_middleware,
    add_ai_message_metadata,
    cache_metrics_middleware,
    retry_model,
    cost_tracking_middleware,   # after retry, captures each individual attempt
    ...
]

[MEDIUM] No persistence — middleware only logs, does not write TurnMetrics

Location: backend/src/utils/middleware.py (cost_tracking_middleware body)

The spec and plan both require a TurnMetrics record to be written to the database per call ("Persist TurnMetrics asynchronously (fire-and-forget DB write)"). The current implementation only emits a logger.info line. The TurnMetrics schema exists in backend/src/schemas/entities/metrics.py but is never instantiated or persisted. Phase 1 acceptance criterion states "Every LLM call produces a TurnMetrics record in the database," which this does not satisfy.

This is not a blocker for merging if this is treated as Phase 1 groundwork only, but the PR description should be explicit that persistence is deferred to a follow-up.

Code Quality Issues

[MEDIUM] `TurnMetrics.agent_phase` uses a plain string instead of the `AgentPhase` enum

Location: backend/src/schemas/entities/metrics.py:13 and backend/src/constants/phases.py

The branch introduces AgentPhase enum in constants/phases.py but TurnMetrics declares agent_phase: str = "solo". This bypasses the enum entirely, making it possible to persist invalid phase values. Use the enum:

from src.constants.phases import AgentPhase

agent_phase: AgentPhase = AgentPhase.SOLO

[MEDIUM] Pricing table missing models that are live in `constants/llm.py`

Location: backend/src/constants/pricing.py

The following model IDs are registered in ChatModels / llm.py but absent from PRICING_TABLE, meaning any call using them returns 0.0 cost silently:

google_genai:gemini-flash-lite-latest
google_genai:gemini-3-flash-preview
groq:openai/gpt-oss-120b

At minimum, a TODO comment should mark these as pending pricing. Alternatively, log a warning when estimate_cost is called with an unrecognised model instead of silently returning zero:

def estimate_cost(model: str, input_tokens: int, output_tokens: int) -> float:
    pricing = PRICING_TABLE.get(model)
    if pricing is None:
        logger.warning(f"estimate_cost: unknown model '{model}', returning 0.0")
        return 0.0
    ...

[MEDIUM] `TurnMetrics` and `ThreadCostSummary` use `float` for monetary values

Location: backend/src/schemas/entities/metrics.py:19,25

The spec calls for Decimal for input_cost, output_cost, total_cost. Using float introduces IEEE 754 rounding errors that accumulate when summing many turns. This matters more for ThreadCostSummary.total_cost_usd (aggregation) than for a single turn. For a billing-adjacent feature this warrants Decimal or at least Annotated[float, Field(ge=0.0)] with explicit rounding discipline documented.

[LOW] `cost_tracking_middleware` duplicates token-extraction logic already in `cache_metrics_middleware`

Location: backend/src/utils/middleware.py:131-178

Both middlewares independently walk response.messages in reverse and call getattr(ai_msg, "usage_metadata", None). This is 15 lines of identical code. Extract to a shared helper:

def _extract_usage(response: ModelResponse) -> tuple[AIMessage | None, object | None]:
    """Return (ai_msg, usage_metadata) or (None, None) if absent."""
    for msg in reversed(getattr(response, "messages", []) or []):
        if isinstance(msg, AIMessage):
            return msg, getattr(msg, "usage_metadata", None)
    return None, None

[LOW] `estimate_cost` return type and precision are not in the function signature context

Location: backend/src/constants/pricing.py:43-50

The function is well-typed and has a docstring, which is good. Minor: rounding to 6 decimal places (round(..., 6)) is reasonable but should be noted in the docstring so callers know not to expect full float precision.

[LOW] `RunBudget` schema is missing `id`, `thread_id`, `assistant_id` fields

Location: backend/src/schemas/entities/metrics.py:37-40

The spec defines RunBudget as having thread_id or assistant_id (at least one required) plus an id for CRUD operations. The current schema contains only max_cost_usd, max_duration_minutes, and action_on_exceed. This is an incomplete implementation, but acceptable if persistence is deferred.

Process / Commit Hygiene

[HIGH] Commits are not GPG-signed

Location: All commits on this branch (590eb526, 8e752fd0, etc.)

git log --format="%H %GS %G?" shows N (no signature) for all branch-specific commits. Project guidelines in CLAUDE.md require git commit -s (DCO sign-off). The feature commit 590eb526 does include Signed-off-by: ryaneggz in the message body, which satisfies DCO, but the %G? = N indicates no GPG key signature. If GPG signing is enforced by CI/branch protection, these commits will be rejected. Please verify the repo's signing requirement and re-sign if needed.

There is no evidence of --no-verify being used (the pre-commit hooks appear to have run based on the clean commit message format), which is good.

Tests

No unit or integration tests are included for:

estimate_cost (pricing math, zero tokens, unknown model, negative token guard)
cost_tracking_middleware (mock ModelResponse with and without usage_metadata)
TurnMetrics schema validation

The plan's Phase 6 calls for >= 80% coverage on new modules. These tests should be in backend/tests/unit/ before this is marked ready for merge.

Summary

Must fix before merge:

Replace datetime.utcnow with datetime.now(timezone.utc) in TurnMetrics.
Fix model name extraction in cost_tracking_middleware to use ai_msg.response_metadata["model_name"] — the current logic will silently produce $0.00 cost on every call.

Should fix before merge:
3. Add a logger.warning for unknown models in estimate_cost so missing pricing entries surface instead of silently returning zero.
4. Use AgentPhase enum on TurnMetrics.agent_phase instead of bare string.
5. Clarify middleware ordering intent relative to retry_model in a comment.
6. Add at least basic unit tests for estimate_cost and schema validation.

Acceptable to defer:

DB persistence (document in PR that this is a logging-only Phase 1 stub)
RunBudget field completeness (document as stub)
float vs Decimal for monetary fields (document known limitation)
Deduplication of token-extraction helper

The pricing table model coverage and math look correct for the models that are present. The middleware structure and decorator pattern correctly follow the cache_metrics_middleware precedent. Good foundation — the critical bug is the model name extraction.

…middleware ordering, pricing gaps Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ryaneggz <kre8mymedia@gmail.com>

chore: add cost tracking spec and implementation plan

8e752fd

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ryaneggz <kre8mymedia@gmail.com>

ryaneggz added the enhancement New feature or request label Mar 25, 2026

feat: cost tracking backend — pricing table, metrics schemas, cost mi…

590eb52

…ddleware Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ryaneggz <kre8mymedia@gmail.com>

ryaneggz had a problem deploying to Test March 25, 2026 02:13 — with GitHub Actions Failure

ryaneggz temporarily deployed to Test March 25, 2026 02:13 — with GitHub Actions Inactive

ryaneggz commented Mar 25, 2026

View reviewed changes

fix: address cost tracking review — model name extraction, datetime, …

9f9771d

…middleware ordering, pricing gaps Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ryaneggz <kre8mymedia@gmail.com>

ryaneggz had a problem deploying to Test March 25, 2026 02:35 — with GitHub Actions Failure

ryaneggz temporarily deployed to Test March 25, 2026 02:35 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Per-run cost tracking, phase-level metering, and budget caps#907

feat: Per-run cost tracking, phase-level metering, and budget caps#907
ryaneggz wants to merge 3 commits intodevelopmentfrom
feat/cost-tracking

ryaneggz commented Mar 25, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Mar 25, 2026 •

edited

Loading

Review skipped

Uh oh!

ryaneggz left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ryaneggz commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files Changed

Spec & Plan

Human Review Checklist

Test Plan

Uh oh!

coderabbitai Bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

ryaneggz left a comment

Choose a reason for hiding this comment

Code Review — feat/cost-tracking

Critical / High Priority Issues

[HIGH] Deprecated datetime.utcnow in TurnMetrics schema

[HIGH] Model name extraction in cost_tracking_middleware is unreliable

[MEDIUM] Middleware ordering: cost_tracking placed before retry_model

[MEDIUM] No persistence — middleware only logs, does not write TurnMetrics

Code Quality Issues

[MEDIUM] TurnMetrics.agent_phase uses a plain string instead of the AgentPhase enum

[MEDIUM] Pricing table missing models that are live in constants/llm.py

[MEDIUM] TurnMetrics and ThreadCostSummary use float for monetary values

[LOW] cost_tracking_middleware duplicates token-extraction logic already in cache_metrics_middleware

[LOW] estimate_cost return type and precision are not in the function signature context

[LOW] RunBudget schema is missing id, thread_id, assistant_id fields

Process / Commit Hygiene

[HIGH] Commits are not GPG-signed

Tests

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ryaneggz commented Mar 25, 2026 •

edited

Loading

coderabbitai Bot commented Mar 25, 2026 •

edited

Loading

[HIGH] Deprecated `datetime.utcnow` in TurnMetrics schema

[HIGH] Model name extraction in `cost_tracking_middleware` is unreliable

[MEDIUM] `TurnMetrics.agent_phase` uses a plain string instead of the `AgentPhase` enum

[MEDIUM] Pricing table missing models that are live in `constants/llm.py`

[MEDIUM] `TurnMetrics` and `ThreadCostSummary` use `float` for monetary values

[LOW] `cost_tracking_middleware` duplicates token-extraction logic already in `cache_metrics_middleware`

[LOW] `estimate_cost` return type and precision are not in the function signature context

[LOW] `RunBudget` schema is missing `id`, `thread_id`, `assistant_id` fields