Skip to content

Comments

feat: calculate API costs for agent steps#1697

Merged
yottahmd merged 5 commits intomainfrom
agent-step-cost
Feb 22, 2026
Merged

feat: calculate API costs for agent steps#1697
yottahmd merged 5 commits intomainfrom
agent-step-cost

Conversation

@yottahmd
Copy link
Collaborator

@yottahmd yottahmd commented Feb 22, 2026

Summary by CodeRabbit

  • New Features

    • Per-message cost tracking for LLM API calls with USD pricing metadata
    • Enhanced agent step message persistence and inheritance from dependency steps
    • Agent steps now support full message context management alongside chat steps
  • Tests

    • Added comprehensive test coverage for cost metadata, message conversion, and context inheritance

@coderabbitai
Copy link

coderabbitai bot commented Feb 22, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds per-message LLM cost tracking via new Cost field in LLMMessageMetadata. Extends agentstep Executor with ChatMessageHandler interface implementation for message persistence and context management. Introduces message conversion helpers, comprehensive test coverage, and runner modifications to support chat message inheritance for agent steps.

Changes

Cohort / File(s) Summary
Cost Field Addition
internal/core/exec/messages.go, internal/core/exec/messages_test.go
Adds Cost float64 field to LLMMessageMetadata struct with JSON serialization tests covering round-trip accuracy and omitempty behavior for zero values.
Message Conversion Helpers
internal/runtime/builtin/agentstep/convert.go, internal/runtime/builtin/agentstep/convert_test.go
Introduces internal conversion functions to transform agent.Message types to exec.LLMMessage, with ToolResults expansion, metadata propagation (Provider, Model, Usage, Cost), and system message filtering for history contexts. Comprehensive unit tests validate all message types and metadata fields.
Executor Enhancement
internal/runtime/builtin/agentstep/executor.go, internal/runtime/builtin/agentstep/executor_test.go
Extends Executor with contextMessages and savedMessages fields, implements ChatMessageHandler interface via SetContext() and GetMessages() methods, and integrates message conversion into Run lifecycle to initialize and persist chat history with cost metadata.
Testing Infrastructure
internal/runtime/builtin/agentstep/testing.go, internal/runtime/runner_helper_test.go
Adds MockExecutor type implementing Executor and ChatMessageHandler for test scaffolding, and introduces agentStep() helper function in runner tests with mock executor registration.
Runner Integration
internal/runtime/runner.go, internal/runtime/runner_test.go
Removes LLM-type guards in setupChatMessages and saveChatMessages to enable chat message inheritance for agent steps with dependencies. Adds test cases validating agent step message persistence, context inheritance from dependencies, and cost metadata preservation.
Documentation
rfcs/021-llm-cost-tracking.md, rfcs/022-agentstep-cost-persistence.md
RFC documents outlining per-session cost persistence, monthly cost aggregation via new API endpoint, UI pages for cost summaries, and detailed specification of agentstep message conversion and persistence mechanisms.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Executor Client
    participant Executor as agentstep.Executor
    participant Converter as Message Converter
    participant Agent as Agent Loop
    
    Client->>Executor: SetContext(contextMessages)
    activate Executor
    Note over Executor: Store contextMessages
    deactivate Executor
    
    Client->>Executor: Run()
    activate Executor
    Note over Executor: Initialize savedMessages from contextMessages
    Executor->>Converter: contextToLLMHistory(contextMessages)
    activate Converter
    Converter-->>Executor: llm.Message[] (filtered history)
    deactivate Converter
    
    Executor->>Agent: Loop with History
    activate Agent
    Agent->>Agent: Execute agent step
    Agent->>Executor: RecordMessage(agentMsg)
    deactivate Agent
    
    activate Executor
    Executor->>Converter: convertMessage(agentMsg)
    activate Converter
    Converter-->>Executor: exec.LLMMessage[] (with metadata, Cost)
    deactivate Converter
    Note over Executor: Append to savedMessages
    deactivate Executor
    
    deactivate Executor
    
    Client->>Executor: GetMessages()
    activate Executor
    Executor-->>Client: savedMessages (with Cost metadata)
    deactivate Executor
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • feat: agent step type #1681 — Directly extends agentstep executor with message handling state and interface implementation that this PR builds upon.
  • feat: chat assistant #1615 — Adds per-message LLM cost tracking and agent/agentstep message conversion mechanisms that align with the Cost field and conversion logic introduced here.
  • feat: agent model selection #1651 — Implements per-message cost aggregation and cost propagation across agent APIs and UI, consuming the Cost field and persistence mechanisms defined in this PR.
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.69% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: calculate API costs for agent steps' directly describes the main feature being added—cost calculation/tracking for agent step executors—which aligns with the RFC and core changes extending LLMMessageMetadata with a Cost field and implementing ChatMessageHandler on Executor.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch agent-step-cost

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/runtime/runner.go (1)

511-538: ⚠️ Potential issue | 🟠 Major

setupChatMessages now issues handler reads for every step with dependencies, not just LLM-capable ones.

The removed isSupportedByLLM guard was the right gate to prevent unnecessary ReadStepMessages calls for shell/script/sub-DAG steps. Without it, every step that has dependencies triggers one ReadStepMessages call per dependency — wasted I/O against whatever persistence backend backs ChatMessagesHandler.

Correctness is preserved (non-LLM executors don't implement ChatMessageHandler, so SetChatMessages / GetChatMessages are no-ops), but performance can degrade at scale. The intended change (agent step support) only requires extending the guard to cover both executor types.

🛡️ Proposed fix — restore a two-type guard
 func (r *Runner) setupChatMessages(ctx context.Context, node *Node) {
 	if r.messagesHandler == nil {
 		return
 	}

 	step := node.Step()
+	executorType := step.ExecutorConfig.Type
+	if executorType != core.ExecutorTypeChat && executorType != core.ExecutorTypeAgent {
+		return
+	}
 	if len(step.Depends) == 0 {
 		return
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/runtime/runner.go` around lines 511 - 538, The function
setupChatMessages currently calls messagesHandler.ReadStepMessages for every
dependency; restore the previous type guard so we only read messages for
dependencies whose executor supports LLM/chat (i.e., use the isSupportedByLLM
check or an equivalent that also returns true for agent-capable executors)
before invoking messagesHandler.ReadStepMessages; specifically, inside
setupChatMessages, for each dep from node.Step().Depends call the support-check
(the same predicate used previously, e.g., isSupportedByLLM(dep) or its extended
variant) and skip ReadStepMessages when false, leaving the subsequent
DeduplicateSystemMessages and node.SetChatMessages behavior unchanged.
🧹 Nitpick comments (4)
internal/runtime/builtin/agentstep/executor_test.go (1)

21-25: Nit: ctx is misleading as a variable name for a []exec.LLMMessage slice.

ctx conventionally refers to context.Context. Renaming to msgs would avoid confusion.

♻️ Proposed rename
-	ctx := []exec.LLMMessage{
+	msgs := []exec.LLMMessage{
 		{Role: exec.RoleSystem, Content: "be helpful"},
 		{Role: exec.RoleUser, Content: "hello"},
 	}
-	e.SetContext(ctx)
-	assert.Equal(t, ctx, e.contextMessages)
+	e.SetContext(msgs)
+	assert.Equal(t, msgs, e.contextMessages)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/runtime/builtin/agentstep/executor_test.go` around lines 21 - 25,
Rename the misleading local variable ctx (type []exec.LLMMessage) to msgs to
avoid confusing it with context.Context; update its declaration and the call
e.SetContext(ctx) to use msgs and e.SetContext(msgs), and ensure any other
references in the test that use ctx are updated accordingly (symbols: ctx ->
msgs, type exec.LLMMessage, method e.SetContext).
internal/core/exec/messages_test.go (1)

95-96: Redundant assert.InDelta after full-struct assert.Equal.

assert.Equal(t, meta, decoded) on line 95 already validates decoded.Cost exactly (the struct is compared field-by-field). The assert.InDelta on line 96 is a no-op guard that adds noise.

♻️ Proposed simplification
  assert.Equal(t, meta, decoded)
- assert.InDelta(t, 0.0042, decoded.Cost, 1e-9)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/exec/messages_test.go` around lines 95 - 96, The test is
redundantly asserting both assert.Equal(t, meta, decoded) and assert.InDelta(t,
0.0042, decoded.Cost, 1e-9); remove the no-op assert.InDelta call (the struct
equality already checks decoded.Cost) or, if you intended to allow a tolerance
for Cost, replace the struct equality with field-by-field checks; update the
test by deleting the assert.InDelta line referencing decoded.Cost and keep the
assert.Equal(t, meta, decoded) (or alternatively remove assert.Equal and assert
the float tolerance on decoded.Cost if variability is required).
internal/runtime/builtin/agentstep/convert.go (1)

16-23: User message Content is silently dropped when ToolResults is non-empty.

If an agent produces a MessageTypeUser with both Content and ToolResults, the content is discarded. This is likely intentional (tool-result messages don't carry prose), but adding a comment would prevent a future maintainer from treating it as a bug:

💡 Suggested clarifying comment
 case agent.MessageTypeUser:
+	// When ToolResults are present they are the message payload; Content is ignored.
 	if len(msg.ToolResults) > 0 {
 		return convertToolResultMessages(msg)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/runtime/builtin/agentstep/convert.go` around lines 16 - 23, The
branch handling agent.MessageTypeUser currently returns
convertToolResultMessages(msg) when msg.ToolResults is non-empty, silently
dropping msg.Content; add a short clarifying comment above that case (or inline)
stating that when ToolResults are present the message is treated as a
tool-result message and any human-readable Content is intentionally
ignored/unused, referencing MessageTypeUser, msg.ToolResults, msg.Content, and
convertToolResultMessages to make the intent explicit for future maintainers
while leaving the behavior unchanged.
internal/runtime/builtin/agentstep/testing.go (1)

1-69: testing.go compiles into production binaries — consider a build constraint.

Because this file doesn't end in _test.go, MockExecutor, MockExecutorType, and RegisterMockExecutors are included in every production build. Including test-only files in the production build can increase binary size and introduce unnecessary dependencies. A build tag keeps them out:

♻️ Add a build constraint to exclude from non-test builds
+//go:build dagutest
+
 package agentstep

Then pass -tags dagutest in CI test runs (go test -tags dagutest ./...). If the project already has a global test tag convention, use that instead.

The current placement is intentional (cross-package test helper), so this is non-blocking if the project prefers keeping test helpers in-package without build tags.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/runtime/builtin/agentstep/testing.go` around lines 1 - 69, The file
defines test-only symbols (MockExecutor, MockExecutorType,
RegisterMockExecutors) that are being compiled into production; add a build
constraint to exclude this file from non-test builds (e.g., add a top-of-file
build tag like //go:build dagutest and the legacy // +build dagutest) so it only
compiles when tests opt-in, and update CI/test invocations to pass -tags
dagutest (or your existing test tag) when running go test; ensure the tag
appears before the package declaration and that you document the tag usage for
test runs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/runtime/builtin/agentstep/convert_test.go`:
- Around line 83-89: Add an assertion verifying the second tool call's arguments
are intact: check m.ToolCalls[1].Function.Arguments equals the expected JSON
string (e.g., {"path":"/tmp/test"}) to ensure no truncation; locate the
assertions around m.ToolCalls[1] in convert_test.go (use the existing assertions
on m.ToolCalls[1].ID and .Function.Name as anchors) and add the Arguments
assertion immediately after them.
- Around line 163-165: Add an assertion that the error message has no metadata:
in the test where you assert result length, role and content (the slice result
and result[0]), also require.Nil(t, result[0].Metadata) (or assert.Nil) to
ensure MessageTypeError produces nil Metadata and catch regressions that attach
cost/provider metadata to error messages.

In `@internal/runtime/builtin/agentstep/testing.go`:
- Line 1: The new source file testing.go in package agentstep is missing the
required GPL v3 license header; run the repository tooling (make addlicense) to
prepend the standard GPL v3 header to testing.go (and any other new files), or
manually add the exact GPL v3 header used across the project to the top of
testing.go so it matches other files in package agentstep.

In `@internal/runtime/runner_test.go`:
- Around line 3067-3078: The test "HandlerNotCalledForAgentStepWithNoMessages"
should explicitly assert that the mock messages handler had zero write calls to
make intent clear and prevent regressions: after using newMockMessagesHandler(),
setupRunner(withMessagesHandler(handler)) and calling plan.assertRun(t,
core.Failed), add an assertion that handler.writeCalls == 0 (or the test
framework's equivalent check) to confirm saveChatMessages was not invoked
(saveChatMessages is guarded by NodeSucceeded) — update the test function
containing agentStep("agent_fail") accordingly.

In `@rfcs/021-llm-cost-tracking.md`:
- Around line 138-147: The permission table and handler disagree: the handler
currently uses requireManagerOrAbove but the table lists
Operator/Viewer/Developer as "Own costs only"; either tighten the table or relax
the middleware. Pick one fix: (A) If non-managers should access their own costs,
replace requireManagerOrAbove with requireAuthenticated (or equivalent) and in
the handler enforce userId == req.user.id (or force req.params.userId =
req.user.id) so only own-costs are returned; or (B) if only managers/admins may
access, remove the Operator/Viewer/Developer rows from the permission table and
keep requireManagerOrAbove. Update the doc text to match the chosen behavior and
reference requireManagerOrAbove and the userId parameter enforcement in the
handler to ensure consistency.
- Around line 96-132: The AgentCostEntry OpenAPI schema lacks the "source"
property referenced in the RFC prose; add a "source" field to the AgentCostEntry
schema (type: string) with a description and restrict values to the three
allowed sources ("agent_chat", "chat_step", "agent_step") so API responses match
the documented breakdown, and ensure any code/path that emits AgentCostEntry
(e.g., the GetCostSummary implementation in internal/agent/api.go and the
handler in internal/service/frontend/api/v1/agent_cost.go) populates this field
accordingly; alternatively, if you intentionally removed source, remove the
prose that lists the three source values to keep spec and documentation
consistent.
- Around line 74-76: The fenced code block containing the HTTP endpoint
declaration "GET /api/v1/agent/cost-summary?month=YYYY-MM&userId=optional" is
missing a language specifier; update the markdown to use a fenced block with an
"http" language tag (i.e., replace the triple backticks around that GET line
with ```http ... ```) so lint rule MD040 is satisfied and the snippet is
properly highlighted.

In `@rfcs/022-agentstep-cost-persistence.md`:
- Around line 55-59: The RecordMessage snippet incorrectly appends the slice
returned by convertMessage directly to e.savedMessages; change the append call
to spread the slice returned by convertMessage(msg, modelCfg) (i.e., use the ...
operator) so that append(e.savedMessages, convertMessage(msg, modelCfg)...)
compiles; update the RFC example to match the real implementation referring to
RecordMessage, convertMessage, e.savedMessages, append and modelCfg.
- Around line 89-96: The RFC table incorrectly lists a change to
internal/runtime/builtin/chat/executor.go (populate Cost in
createResponseMetadata) but that file isn't modified in this PR; either add the
actual change to populate the Cost field in the createResponseMetadata
implementation in chat/executor.go, or update the RFC table to remove or mark
that row as deferred/out-of-scope (or move it to RFC 021) so the table matches
the PR contents and does not reference a non-existent change.

---

Outside diff comments:
In `@internal/runtime/runner.go`:
- Around line 511-538: The function setupChatMessages currently calls
messagesHandler.ReadStepMessages for every dependency; restore the previous type
guard so we only read messages for dependencies whose executor supports LLM/chat
(i.e., use the isSupportedByLLM check or an equivalent that also returns true
for agent-capable executors) before invoking messagesHandler.ReadStepMessages;
specifically, inside setupChatMessages, for each dep from node.Step().Depends
call the support-check (the same predicate used previously, e.g.,
isSupportedByLLM(dep) or its extended variant) and skip ReadStepMessages when
false, leaving the subsequent DeduplicateSystemMessages and node.SetChatMessages
behavior unchanged.

---

Nitpick comments:
In `@internal/core/exec/messages_test.go`:
- Around line 95-96: The test is redundantly asserting both assert.Equal(t,
meta, decoded) and assert.InDelta(t, 0.0042, decoded.Cost, 1e-9); remove the
no-op assert.InDelta call (the struct equality already checks decoded.Cost) or,
if you intended to allow a tolerance for Cost, replace the struct equality with
field-by-field checks; update the test by deleting the assert.InDelta line
referencing decoded.Cost and keep the assert.Equal(t, meta, decoded) (or
alternatively remove assert.Equal and assert the float tolerance on decoded.Cost
if variability is required).

In `@internal/runtime/builtin/agentstep/convert.go`:
- Around line 16-23: The branch handling agent.MessageTypeUser currently returns
convertToolResultMessages(msg) when msg.ToolResults is non-empty, silently
dropping msg.Content; add a short clarifying comment above that case (or inline)
stating that when ToolResults are present the message is treated as a
tool-result message and any human-readable Content is intentionally
ignored/unused, referencing MessageTypeUser, msg.ToolResults, msg.Content, and
convertToolResultMessages to make the intent explicit for future maintainers
while leaving the behavior unchanged.

In `@internal/runtime/builtin/agentstep/executor_test.go`:
- Around line 21-25: Rename the misleading local variable ctx (type
[]exec.LLMMessage) to msgs to avoid confusing it with context.Context; update
its declaration and the call e.SetContext(ctx) to use msgs and
e.SetContext(msgs), and ensure any other references in the test that use ctx are
updated accordingly (symbols: ctx -> msgs, type exec.LLMMessage, method
e.SetContext).

In `@internal/runtime/builtin/agentstep/testing.go`:
- Around line 1-69: The file defines test-only symbols (MockExecutor,
MockExecutorType, RegisterMockExecutors) that are being compiled into
production; add a build constraint to exclude this file from non-test builds
(e.g., add a top-of-file build tag like //go:build dagutest and the legacy //
+build dagutest) so it only compiles when tests opt-in, and update CI/test
invocations to pass -tags dagutest (or your existing test tag) when running go
test; ensure the tag appears before the package declaration and that you
document the tag usage for test runs.

@yottahmd yottahmd merged commit 56015b3 into main Feb 22, 2026
5 checks passed
@yottahmd yottahmd deleted the agent-step-cost branch February 22, 2026 14:09
@codecov
Copy link

codecov bot commented Feb 22, 2026

Codecov Report

❌ Patch coverage is 84.61538% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.77%. Comparing base (af94591) to head (b4ef96e).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
internal/runtime/builtin/agentstep/convert.go 85.13% 0 Missing and 11 partials ⚠️
internal/runtime/builtin/agentstep/executor.go 28.57% 5 Missing ⚠️
internal/runtime/builtin/agentstep/testing.go 96.96% 1 Missing ⚠️
internal/runtime/runner.go 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1697      +/-   ##
==========================================
- Coverage   69.95%   69.77%   -0.19%     
==========================================
  Files         372      374       +2     
  Lines       41330    41445     +115     
==========================================
+ Hits        28914    28919       +5     
  Misses      10089    10089              
- Partials     2327     2437     +110     
Files with missing lines Coverage Δ
internal/core/exec/messages.go 100.00% <ø> (ø)
internal/runtime/builtin/agentstep/testing.go 96.96% <96.96%> (ø)
internal/runtime/runner.go 73.73% <66.66%> (-13.53%) ⬇️
internal/runtime/builtin/agentstep/executor.go 7.78% <28.57%> (+4.98%) ⬆️
internal/runtime/builtin/agentstep/convert.go 85.13% <85.13%> (ø)

... and 9 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update af94591...b4ef96e. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant