Skip to content

fix(cli): emit cumulative token usage in stream-json/json complete event#8870

Open
namanvats-dev wants to merge 3 commits intoaaif-goose:mainfrom
namanvats-dev:fix/cli-cumulative-token-usage-stream-json
Open

fix(cli): emit cumulative token usage in stream-json/json complete event#8870
namanvats-dev wants to merge 3 commits intoaaif-goose:mainfrom
namanvats-dev:fix/cli-cumulative-token-usage-stream-json

Conversation

@namanvats-dev
Copy link
Copy Markdown

Summary

The complete event in goose run --output-format stream-json (and metadata.total_tokens in --output-format json) reports Session::total_tokens, which is the last turn's context size, it gets overwritten every LLM round-trip and reset to the summary output count after compaction (see update_session_metrics in crates/goose/src/agents/reply_parts.rs).

Downstream tools (eval harnesses, billing dashboards) interpret total_tokens as the cumulative session cost, so they under-report by an order of magnitude on multi-turn runs. We hit this in our eval harness where a session that streamed thousands of chunks reported total_tokens=17099, actually just the final-turn context.

Fix: read Session::accumulated_total_tokens (the true running sum), with total_tokens as a fallback for legacy/empty sessions. Also surface input_tokens / output_tokens so consumers can split prompt vs. completion cost without re-deriving them.

Backwards compatible: total_tokens keeps its name and position; existing consumers see a correct number where they previously saw an under-count. The two new fields are Option<i32> with skip_serializing_if = "Option::is_none" — omitted when unset, so no parser breakage.

Files touched: crates/goose-cli/src/session/mod.rs (JsonMetadata, StreamEvent::Complete, both emit sites).

Testing

  • cargo check -p goose-cli : clean (verified locally, ~3 min build)
  • cargo test -p goose-cli : please run in CI
  • Manual: re-ran a multi-turn goose run --output-format stream-json against our eval harness and confirmed the emitted total_tokens now matches the cumulative provider-reported usage rather than the last-turn context size

Related Issues

N/A : surfaced internally while debugging token under-reporting in an external eval harness consuming --output-format stream-json. Happy to file a tracking issue if preferred.

Screenshots/Demos (for UX changes)

N/A : backend change, no UX surface.

The `complete` event in `goose run --output-format stream-json` (and
`metadata.total_tokens` in `--output-format json`) was reporting
`Session::total_tokens`, which is the *last turn's* context size — it
gets overwritten on every LLM round-trip and reset to the summary
output count after compaction (see `update_session_metrics` in
`crates/goose/src/agents/reply_parts.rs`).

Downstream tools (eval harnesses, billing, dashboards) interpret
`total_tokens` as the cumulative session cost, so they were under-
reporting actual usage by an order of magnitude on multi-turn runs.

Switch to `Session::accumulated_total_tokens` (sum across all turns),
falling back to `total_tokens` only when the accumulator is unset.
Also surface `input_tokens` / `output_tokens` so consumers can
attribute prompt vs. completion cost without re-deriving them.

Backwards compatible: `total_tokens` keeps its name and position; the
two new fields are `Option<i32>` and skipped when None.
Collapse the JsonMetadata field expressions onto single lines per
rustfmt's preferred layout.
@namanvats-dev
Copy link
Copy Markdown
Author

Closes #8871

Copy link
Copy Markdown
Collaborator

@angiejones angiejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove comments

@namanvats-dev
Copy link
Copy Markdown
Author

@angiejones removed the comment, can you please check once

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants