feat(observability): stamp per-turn pricing metadata and cumulative cost onto session log by rootfs · Pull Request #1740 · vllm-project/semantic-router

rootfs · 2026-04-10T01:18:46Z

FIX #1742

Purpose

What does this PR change? Add per turn and cumulative pricing metrics
Why is this change needed?
Which module(s) does this affect? Router / CLI / Dashboard / Operator / Fleet-Sim / Bindings / Training / E2E / Docs / CI/Build

Test Plan

What commands, checks, or manual steps should reviewers use?
Why is this validation sufficient for the affected module(s)?

Test Result

What were the actual results?
Any follow-up risks, gaps, or blockers?

Semantic Router PR Checklist

PR title uses module-aligned prefixes such as [Router], [CLI], [Dashboard], [Operator], [Fleet-Sim], [Bindings], [Training], [E2E], [Docs], or [CI/Build]
If the PR spans multiple modules, the title includes all relevant prefixes
Commits in this PR are signed off with git commit -s
The Purpose, Test Plan, and Test Result sections reflect the actual scope, commands, and blockers for this change

See CONTRIBUTING.md for the full contributor workflow and commit guidance.

netlify · 2026-04-10T01:18:54Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`c3b9b3c`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/69d95770b208e50008701250
😎 Deploy Preview	https://deploy-preview-1740--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2026-04-10T01:19:06Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `e2e`

Owners: @Xunzhuo, @yossiovadia, @szedan-rh, @henschwartz, @mkoushni
Files changed:

e2e/pkg/testmatrix/testcases.go
e2e/testcases/session_pricing_e2e.go

📁 `src/semantic-router`

Owners: @rootfs, @Xunzhuo, @szedan-rh, @yehuditkerido, @abdallahsamabd, @asaadbalum, @liavweiss, @noalimoy
Files changed:

src/semantic-router/pkg/config/model_config_types.go
src/semantic-router/pkg/config/pricing_helper.go
src/semantic-router/pkg/extproc/processor_res_usage.go
src/semantic-router/pkg/extproc/session_telemetry.go
src/semantic-router/pkg/observability/metrics/session_cost.go
src/semantic-router/pkg/sessiontelemetry/telemetry.go
src/semantic-router/pkg/sessiontelemetry/telemetry_test.go

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

github-actions · 2026-04-10T01:19:20Z

✅ Supply Chain Security Report — All Clear

Scanner	Status	Findings
AST Codebase Scan (Py, Go, JS/TS, Rust)	✅	27 finding(s) — MEDIUM: 21 · LOW: 6
AST PR Diff Scan	✅	No issues detected
Regex Fallback Scan	✅	No issues detected

Scanned at 2026-04-10T20:03:24.474Z · View full workflow logs

Copilot

Pull request overview

Adds per-turn pricing metadata and session-level cumulative cost tracking to session telemetry, wiring model pricing from router config through extproc into structured logs and a new Prometheus histogram.

Changes:

Extend session telemetry state/logging to include per-turn cost and cumulative session cost, and emit a new llm_session_turn_cost_usd histogram when pricing is configured.
Plumb model pricing (including cached-input rate) from RouterConfig into extproc session telemetry recording for both streaming and non-streaming paths.
Add unit tests for cost accumulation behavior and introduce E2E testcases intended to validate the metric exposure.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
src/semantic-router/pkg/sessiontelemetry/telemetry.go	Adds pricing value type, cost computation, cumulative cost state, conditional cost metric/log fields.
src/semantic-router/pkg/sessiontelemetry/telemetry_test.go	Adds unit tests validating cost accumulation and conditional metric observation.
src/semantic-router/pkg/observability/metrics/session_cost.go	Introduces the per-turn session cost Prometheus histogram and recording helper.
src/semantic-router/pkg/extproc/session_telemetry.go	Fetches model pricing from config and passes it into session telemetry events.
src/semantic-router/pkg/extproc/processor_res_usage.go	Wires pricing into both streaming and non-streaming session turn recording.
src/semantic-router/pkg/config/pricing_helper.go	Adds helper to fetch full pricing (including cached-input rate) for a resolved model.
src/semantic-router/pkg/config/model_config_types.go	Extends `ModelPricing` to include `cached_input_per_1m`.
e2e/testcases/session_pricing_e2e.go	Adds E2E testcases intended to assert the new cost histogram is exposed after requests.

Copilot · 2026-04-10T02:04:45Z

src/semantic-router/pkg/sessiontelemetry/telemetry.go

+	if p.Pricing.isConfigured() {
+		currency := p.Pricing.Currency
+		if currency == "" {
+			currency = "USD"
+		}
+		fields["pricing_prompt_per_1m"] = p.Pricing.PromptPer1M
+		fields["pricing_completion_per_1m"] = p.Pricing.CompletionPer1M
+		fields["pricing_cached_input_per_1m"] = p.Pricing.CachedInputPer1M
+		fields["pricing_currency"] = currency
+		fields["cost_this_turn_usd"] = costThisTurn
+		fields["cumulative_cost_usd"] = cumCost
+	}


The log fields and metric names use a hard-coded "*_usd" suffix, but pricing_currency can be set to non-USD via config. In that case cost_this_turn_usd/cumulative_cost_usd (and llm_session_turn_cost_usd) would actually be in the configured currency, which is misleading/incorrect for downstream consumers. Consider either (a) only recording cost when currency is USD/empty, (b) converting to USD before stamping/observing, or (c) renaming fields/metrics to be currency-aware (e.g., include currency label and remove _usd from the name).

Copilot · 2026-04-10T02:04:45Z

src/semantic-router/pkg/sessiontelemetry/telemetry.go

+// TurnPricing carries the active per-1M token prices stamped onto a dispatch log entry.
+// All rates are in Currency (default "USD"). Zero values mean pricing is not configured.
+type TurnPricing struct {
+	Currency         string
+	PromptPer1M      float64
+	CompletionPer1M  float64
+	CachedInputPer1M float64
+}
+
+// isConfigured reports whether any price or explicit currency has been set.
+func (p TurnPricing) isConfigured() bool {
+	return p.PromptPer1M != 0 || p.CompletionPer1M != 0 || p.CachedInputPer1M != 0 || p.Currency != ""
+}


TurnPricing's doc comment says "Zero values mean pricing is not configured", but isConfigured() returns true when only Currency is set (even with all rates 0). This makes it unclear when cost fields/histograms should be emitted. Please align the comment and semantics (e.g., require at least one non-zero rate, or update the comment to explicitly treat currency-only as configured).

Copilot · 2026-04-10T02:04:45Z

src/semantic-router/pkg/sessiontelemetry/telemetry.go

+// computeCost returns the turn cost in USD given token counts and pricing rates.
+// Returns 0 when pricing is not configured.
+func computeCost(promptTokens, completionTokens int, pricing TurnPricing) float64 {
+	if !pricing.isConfigured() {
+		return 0
+	}
+	return (float64(promptTokens)*pricing.PromptPer1M +
+		float64(completionTokens)*pricing.CompletionPer1M) / 1_000_000.0


CachedInputPer1M is treated as part of pricing configuration (it is logged and included in isConfigured()), but computeCost() ignores it entirely. If cached input tokens are billed differently, the resulting per-turn/cumulative cost will be inaccurate. Consider extending TurnParams to carry cached-input token counts (and incorporate them), or avoid advertising/using cached-input pricing until it’s actually applied in the cost calculation.

Copilot · 2026-04-10T02:04:45Z

src/semantic-router/pkg/config/pricing_helper.go

+// GetFullModelPricing returns the complete ModelPricing entry for the given model,
+// including CachedInputPer1M. The second return value is false when no pricing
+// is configured for the model. Accepts both short names and provider model IDs.
+func (c *RouterConfig) GetFullModelPricing(modelName string) (ModelPricing, bool) {
+	if modelConfig, ok := c.resolveModelConfig(modelName); ok {
+		p := modelConfig.Pricing
+		if p.PromptPer1M != 0 || p.CompletionPer1M != 0 || p.CachedInputPer1M != 0 || p.Currency != "" {
+			if p.Currency == "" {
+				p.Currency = "USD"
+			}
+			return p, true
+		}


The function comment says the second return value is false when "no pricing is configured", but the implementation treats currency alone (with all per-1M rates at 0) as configured and returns (p, true). Please clarify/align the definition of "configured" here (e.g., require at least one non-zero rate, or update the comment to state that currency-only counts as configured).

Copilot · 2026-04-10T02:04:46Z

e2e/testcases/session_pricing_e2e.go

+func init() {
+	pkgtestcases.Register("session-pricing-chat-completions", pkgtestcases.TestCase{
+		Description: "After a routed chat completion, Prometheus exposes llm_session_turn_cost_usd histogram when model pricing is configured",
+		Tags:        []string{"kubernetes", "observability", "metrics", "llm", "pricing"},
+		Fn:          testSessionPricingChatCompletions,
+	})
+	pkgtestcases.Register("session-pricing-response-api", pkgtestcases.TestCase{
+		Description: "After a routed Response API call, Prometheus exposes llm_session_turn_cost_usd histogram when model pricing is configured",
+		Tags:        []string{"kubernetes", "observability", "metrics", "llm", "pricing", "response-api"},
+		Fn:          testSessionPricingResponseAPI,
+	})


These new test cases are registered, but they are not referenced by any testmatrix group or profile GetTestCases() list (search shows the names only appear in this file). As a result, they likely won’t run in CI unless manually selected. If these are meant to provide durable E2E coverage for session cost telemetry, add them to an appropriate testmatrix group/profile so the harness executes them by default.

…ost onto session log Signed-off-by: Huamin Chen <huaminchen@microsoft.com>

rootfs requested a review from Xunzhuo as a code owner April 10, 2026 01:18

github-actions bot assigned abdallahsamabd, asaadbalum, liavweiss, noalimoy, rootfs, szedan-rh, Xunzhuo, yehuditkerido and yossiovadia Apr 10, 2026

rootfs force-pushed the per-turn-pricing branch 2 times, most recently from a0aefa3 to d0a806a Compare April 10, 2026 01:58

rootfs requested a review from Copilot April 10, 2026 01:59

Copilot started reviewing on behalf of rootfs April 10, 2026 01:59 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

rootfs force-pushed the per-turn-pricing branch from d0a806a to 9660b78 Compare April 10, 2026 12:17

feat(observability): stamp per-turn pricing metadata and cumulative c…

c3b9b3c

…ost onto session log Signed-off-by: Huamin Chen <huaminchen@microsoft.com>

rootfs force-pushed the per-turn-pricing branch from 9660b78 to c3b9b3c Compare April 10, 2026 20:02

rootfs merged commit c7bf894 into vllm-project:main Apr 10, 2026
32 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(observability): stamp per-turn pricing metadata and cumulative cost onto session log#1740

feat(observability): stamp per-turn pricing metadata and cumulative cost onto session log#1740
rootfs merged 1 commit intovllm-project:mainfrom
rootfs:per-turn-pricing

rootfs commented Apr 10, 2026 •

edited

Loading

Uh oh!

netlify bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 10, 2026

Uh oh!

Copilot AI Apr 10, 2026

Uh oh!

Copilot AI Apr 10, 2026

Uh oh!

Copilot AI Apr 10, 2026

Uh oh!

Copilot AI Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Conversation

rootfs commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

netlify bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

github-actions bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👥 vLLM Semantic Team Notification

📁 e2e

📁 src/semantic-router

🎉 Thanks for your contributions!

Uh oh!

github-actions bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Supply Chain Security Report — All Clear

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

rootfs commented Apr 10, 2026 •

edited

Loading

netlify bot commented Apr 10, 2026 •

edited

Loading

github-actions bot commented Apr 10, 2026 •

edited

Loading

📁 `e2e`

📁 `src/semantic-router`

github-actions bot commented Apr 10, 2026 •

edited

Loading