Skip to content

feat(observability): stamp per-turn pricing metadata and cumulative cost onto session log#1740

Merged
rootfs merged 1 commit intovllm-project:mainfrom
rootfs:per-turn-pricing
Apr 10, 2026
Merged

feat(observability): stamp per-turn pricing metadata and cumulative cost onto session log#1740
rootfs merged 1 commit intovllm-project:mainfrom
rootfs:per-turn-pricing

Conversation

@rootfs
Copy link
Copy Markdown
Collaborator

@rootfs rootfs commented Apr 10, 2026

FIX #1742

Purpose

  • What does this PR change? Add per turn and cumulative pricing metrics
  • Why is this change needed?
  • Which module(s) does this affect? Router / CLI / Dashboard / Operator / Fleet-Sim / Bindings / Training / E2E / Docs / CI/Build

Test Plan

  • What commands, checks, or manual steps should reviewers use?
  • Why is this validation sufficient for the affected module(s)?

Test Result

  • What were the actual results?
  • Any follow-up risks, gaps, or blockers?

Semantic Router PR Checklist
  • PR title uses module-aligned prefixes such as [Router], [CLI], [Dashboard], [Operator], [Fleet-Sim], [Bindings], [Training], [E2E], [Docs], or [CI/Build]
  • If the PR spans multiple modules, the title includes all relevant prefixes
  • Commits in this PR are signed off with git commit -s
  • The Purpose, Test Plan, and Test Result sections reflect the actual scope, commands, and blockers for this change

See CONTRIBUTING.md for the full contributor workflow and commit guidance.

@rootfs rootfs requested a review from Xunzhuo as a code owner April 10, 2026 01:18
@netlify
Copy link
Copy Markdown

netlify bot commented Apr 10, 2026

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit c3b9b3c
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/69d95770b208e50008701250
😎 Deploy Preview https://deploy-preview-1740--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 10, 2026

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 e2e

Owners: @Xunzhuo, @yossiovadia, @szedan-rh, @henschwartz, @mkoushni
Files changed:

  • e2e/pkg/testmatrix/testcases.go
  • e2e/testcases/session_pricing_e2e.go

📁 src/semantic-router

Owners: @rootfs, @Xunzhuo, @szedan-rh, @yehuditkerido, @abdallahsamabd, @asaadbalum, @liavweiss, @noalimoy
Files changed:

  • src/semantic-router/pkg/config/model_config_types.go
  • src/semantic-router/pkg/config/pricing_helper.go
  • src/semantic-router/pkg/extproc/processor_res_usage.go
  • src/semantic-router/pkg/extproc/session_telemetry.go
  • src/semantic-router/pkg/observability/metrics/session_cost.go
  • src/semantic-router/pkg/sessiontelemetry/telemetry.go
  • src/semantic-router/pkg/sessiontelemetry/telemetry_test.go

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 10, 2026

✅ Supply Chain Security Report — All Clear

Scanner Status Findings
AST Codebase Scan (Py, Go, JS/TS, Rust) 27 finding(s) — MEDIUM: 21 · LOW: 6
AST PR Diff Scan No issues detected
Regex Fallback Scan No issues detected

Scanned at 2026-04-10T20:03:24.474Z · View full workflow logs

@rootfs rootfs force-pushed the per-turn-pricing branch 2 times, most recently from a0aefa3 to d0a806a Compare April 10, 2026 01:58
@rootfs rootfs requested a review from Copilot April 10, 2026 01:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds per-turn pricing metadata and session-level cumulative cost tracking to session telemetry, wiring model pricing from router config through extproc into structured logs and a new Prometheus histogram.

Changes:

  • Extend session telemetry state/logging to include per-turn cost and cumulative session cost, and emit a new llm_session_turn_cost_usd histogram when pricing is configured.
  • Plumb model pricing (including cached-input rate) from RouterConfig into extproc session telemetry recording for both streaming and non-streaming paths.
  • Add unit tests for cost accumulation behavior and introduce E2E testcases intended to validate the metric exposure.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/semantic-router/pkg/sessiontelemetry/telemetry.go Adds pricing value type, cost computation, cumulative cost state, conditional cost metric/log fields.
src/semantic-router/pkg/sessiontelemetry/telemetry_test.go Adds unit tests validating cost accumulation and conditional metric observation.
src/semantic-router/pkg/observability/metrics/session_cost.go Introduces the per-turn session cost Prometheus histogram and recording helper.
src/semantic-router/pkg/extproc/session_telemetry.go Fetches model pricing from config and passes it into session telemetry events.
src/semantic-router/pkg/extproc/processor_res_usage.go Wires pricing into both streaming and non-streaming session turn recording.
src/semantic-router/pkg/config/pricing_helper.go Adds helper to fetch full pricing (including cached-input rate) for a resolved model.
src/semantic-router/pkg/config/model_config_types.go Extends ModelPricing to include cached_input_per_1m.
e2e/testcases/session_pricing_e2e.go Adds E2E testcases intended to assert the new cost histogram is exposed after requests.

Comment on lines +162 to +173
if p.Pricing.isConfigured() {
currency := p.Pricing.Currency
if currency == "" {
currency = "USD"
}
fields["pricing_prompt_per_1m"] = p.Pricing.PromptPer1M
fields["pricing_completion_per_1m"] = p.Pricing.CompletionPer1M
fields["pricing_cached_input_per_1m"] = p.Pricing.CachedInputPer1M
fields["pricing_currency"] = currency
fields["cost_this_turn_usd"] = costThisTurn
fields["cumulative_cost_usd"] = cumCost
}
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log fields and metric names use a hard-coded "*_usd" suffix, but pricing_currency can be set to non-USD via config. In that case cost_this_turn_usd/cumulative_cost_usd (and llm_session_turn_cost_usd) would actually be in the configured currency, which is misleading/incorrect for downstream consumers. Consider either (a) only recording cost when currency is USD/empty, (b) converting to USD before stamping/observing, or (c) renaming fields/metrics to be currency-aware (e.g., include currency label and remove _usd from the name).

Copilot uses AI. Check for mistakes.
Comment on lines +54 to +66
// TurnPricing carries the active per-1M token prices stamped onto a dispatch log entry.
// All rates are in Currency (default "USD"). Zero values mean pricing is not configured.
type TurnPricing struct {
Currency string
PromptPer1M float64
CompletionPer1M float64
CachedInputPer1M float64
}

// isConfigured reports whether any price or explicit currency has been set.
func (p TurnPricing) isConfigured() bool {
return p.PromptPer1M != 0 || p.CompletionPer1M != 0 || p.CachedInputPer1M != 0 || p.Currency != ""
}
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TurnPricing's doc comment says "Zero values mean pricing is not configured", but isConfigured() returns true when only Currency is set (even with all rates 0). This makes it unclear when cost fields/histograms should be emitted. Please align the comment and semantics (e.g., require at least one non-zero rate, or update the comment to explicitly treat currency-only as configured).

Copilot uses AI. Check for mistakes.
Comment on lines +177 to +184
// computeCost returns the turn cost in USD given token counts and pricing rates.
// Returns 0 when pricing is not configured.
func computeCost(promptTokens, completionTokens int, pricing TurnPricing) float64 {
if !pricing.isConfigured() {
return 0
}
return (float64(promptTokens)*pricing.PromptPer1M +
float64(completionTokens)*pricing.CompletionPer1M) / 1_000_000.0
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CachedInputPer1M is treated as part of pricing configuration (it is logged and included in isConfigured()), but computeCost() ignores it entirely. If cached input tokens are billed differently, the resulting per-turn/cumulative cost will be inaccurate. Consider extending TurnParams to carry cached-input token counts (and incorporate them), or avoid advertising/using cached-input pricing until it’s actually applied in the cost calculation.

Copilot uses AI. Check for mistakes.
Comment on lines +3 to +14
// GetFullModelPricing returns the complete ModelPricing entry for the given model,
// including CachedInputPer1M. The second return value is false when no pricing
// is configured for the model. Accepts both short names and provider model IDs.
func (c *RouterConfig) GetFullModelPricing(modelName string) (ModelPricing, bool) {
if modelConfig, ok := c.resolveModelConfig(modelName); ok {
p := modelConfig.Pricing
if p.PromptPer1M != 0 || p.CompletionPer1M != 0 || p.CachedInputPer1M != 0 || p.Currency != "" {
if p.Currency == "" {
p.Currency = "USD"
}
return p, true
}
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function comment says the second return value is false when "no pricing is configured", but the implementation treats currency alone (with all per-1M rates at 0) as configured and returns (p, true). Please clarify/align the definition of "configured" here (e.g., require at least one non-zero rate, or update the comment to state that currency-only counts as configured).

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +25
func init() {
pkgtestcases.Register("session-pricing-chat-completions", pkgtestcases.TestCase{
Description: "After a routed chat completion, Prometheus exposes llm_session_turn_cost_usd histogram when model pricing is configured",
Tags: []string{"kubernetes", "observability", "metrics", "llm", "pricing"},
Fn: testSessionPricingChatCompletions,
})
pkgtestcases.Register("session-pricing-response-api", pkgtestcases.TestCase{
Description: "After a routed Response API call, Prometheus exposes llm_session_turn_cost_usd histogram when model pricing is configured",
Tags: []string{"kubernetes", "observability", "metrics", "llm", "pricing", "response-api"},
Fn: testSessionPricingResponseAPI,
})
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new test cases are registered, but they are not referenced by any testmatrix group or profile GetTestCases() list (search shows the names only appear in this file). As a result, they likely won’t run in CI unless manually selected. If these are meant to provide durable E2E coverage for session cost telemetry, add them to an appropriate testmatrix group/profile so the harness executes them by default.

Copilot generated this review using guidance from repository custom instructions.
@rootfs rootfs force-pushed the per-turn-pricing branch from d0a806a to 9660b78 Compare April 10, 2026 12:17
…ost onto session log

Signed-off-by: Huamin Chen <huaminchen@microsoft.com>
@rootfs rootfs force-pushed the per-turn-pricing branch from 9660b78 to c3b9b3c Compare April 10, 2026 20:02
@rootfs rootfs merged commit c7bf894 into vllm-project:main Apr 10, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature: add per-turn session telemetry and cumulative cost state for multi-turn routing

10 participants