feat(observability): stamp per-turn pricing metadata and cumulative cost onto session log#1740
Conversation
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
✅ Supply Chain Security Report — All Clear
Scanned at |
a0aefa3 to
d0a806a
Compare
There was a problem hiding this comment.
Pull request overview
Adds per-turn pricing metadata and session-level cumulative cost tracking to session telemetry, wiring model pricing from router config through extproc into structured logs and a new Prometheus histogram.
Changes:
- Extend session telemetry state/logging to include per-turn cost and cumulative session cost, and emit a new
llm_session_turn_cost_usdhistogram when pricing is configured. - Plumb model pricing (including cached-input rate) from
RouterConfiginto extproc session telemetry recording for both streaming and non-streaming paths. - Add unit tests for cost accumulation behavior and introduce E2E testcases intended to validate the metric exposure.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/semantic-router/pkg/sessiontelemetry/telemetry.go | Adds pricing value type, cost computation, cumulative cost state, conditional cost metric/log fields. |
| src/semantic-router/pkg/sessiontelemetry/telemetry_test.go | Adds unit tests validating cost accumulation and conditional metric observation. |
| src/semantic-router/pkg/observability/metrics/session_cost.go | Introduces the per-turn session cost Prometheus histogram and recording helper. |
| src/semantic-router/pkg/extproc/session_telemetry.go | Fetches model pricing from config and passes it into session telemetry events. |
| src/semantic-router/pkg/extproc/processor_res_usage.go | Wires pricing into both streaming and non-streaming session turn recording. |
| src/semantic-router/pkg/config/pricing_helper.go | Adds helper to fetch full pricing (including cached-input rate) for a resolved model. |
| src/semantic-router/pkg/config/model_config_types.go | Extends ModelPricing to include cached_input_per_1m. |
| e2e/testcases/session_pricing_e2e.go | Adds E2E testcases intended to assert the new cost histogram is exposed after requests. |
| if p.Pricing.isConfigured() { | ||
| currency := p.Pricing.Currency | ||
| if currency == "" { | ||
| currency = "USD" | ||
| } | ||
| fields["pricing_prompt_per_1m"] = p.Pricing.PromptPer1M | ||
| fields["pricing_completion_per_1m"] = p.Pricing.CompletionPer1M | ||
| fields["pricing_cached_input_per_1m"] = p.Pricing.CachedInputPer1M | ||
| fields["pricing_currency"] = currency | ||
| fields["cost_this_turn_usd"] = costThisTurn | ||
| fields["cumulative_cost_usd"] = cumCost | ||
| } |
There was a problem hiding this comment.
The log fields and metric names use a hard-coded "*_usd" suffix, but pricing_currency can be set to non-USD via config. In that case cost_this_turn_usd/cumulative_cost_usd (and llm_session_turn_cost_usd) would actually be in the configured currency, which is misleading/incorrect for downstream consumers. Consider either (a) only recording cost when currency is USD/empty, (b) converting to USD before stamping/observing, or (c) renaming fields/metrics to be currency-aware (e.g., include currency label and remove _usd from the name).
| // TurnPricing carries the active per-1M token prices stamped onto a dispatch log entry. | ||
| // All rates are in Currency (default "USD"). Zero values mean pricing is not configured. | ||
| type TurnPricing struct { | ||
| Currency string | ||
| PromptPer1M float64 | ||
| CompletionPer1M float64 | ||
| CachedInputPer1M float64 | ||
| } | ||
|
|
||
| // isConfigured reports whether any price or explicit currency has been set. | ||
| func (p TurnPricing) isConfigured() bool { | ||
| return p.PromptPer1M != 0 || p.CompletionPer1M != 0 || p.CachedInputPer1M != 0 || p.Currency != "" | ||
| } |
There was a problem hiding this comment.
TurnPricing's doc comment says "Zero values mean pricing is not configured", but isConfigured() returns true when only Currency is set (even with all rates 0). This makes it unclear when cost fields/histograms should be emitted. Please align the comment and semantics (e.g., require at least one non-zero rate, or update the comment to explicitly treat currency-only as configured).
| // computeCost returns the turn cost in USD given token counts and pricing rates. | ||
| // Returns 0 when pricing is not configured. | ||
| func computeCost(promptTokens, completionTokens int, pricing TurnPricing) float64 { | ||
| if !pricing.isConfigured() { | ||
| return 0 | ||
| } | ||
| return (float64(promptTokens)*pricing.PromptPer1M + | ||
| float64(completionTokens)*pricing.CompletionPer1M) / 1_000_000.0 |
There was a problem hiding this comment.
CachedInputPer1M is treated as part of pricing configuration (it is logged and included in isConfigured()), but computeCost() ignores it entirely. If cached input tokens are billed differently, the resulting per-turn/cumulative cost will be inaccurate. Consider extending TurnParams to carry cached-input token counts (and incorporate them), or avoid advertising/using cached-input pricing until it’s actually applied in the cost calculation.
| // GetFullModelPricing returns the complete ModelPricing entry for the given model, | ||
| // including CachedInputPer1M. The second return value is false when no pricing | ||
| // is configured for the model. Accepts both short names and provider model IDs. | ||
| func (c *RouterConfig) GetFullModelPricing(modelName string) (ModelPricing, bool) { | ||
| if modelConfig, ok := c.resolveModelConfig(modelName); ok { | ||
| p := modelConfig.Pricing | ||
| if p.PromptPer1M != 0 || p.CompletionPer1M != 0 || p.CachedInputPer1M != 0 || p.Currency != "" { | ||
| if p.Currency == "" { | ||
| p.Currency = "USD" | ||
| } | ||
| return p, true | ||
| } |
There was a problem hiding this comment.
The function comment says the second return value is false when "no pricing is configured", but the implementation treats currency alone (with all per-1M rates at 0) as configured and returns (p, true). Please clarify/align the definition of "configured" here (e.g., require at least one non-zero rate, or update the comment to state that currency-only counts as configured).
| func init() { | ||
| pkgtestcases.Register("session-pricing-chat-completions", pkgtestcases.TestCase{ | ||
| Description: "After a routed chat completion, Prometheus exposes llm_session_turn_cost_usd histogram when model pricing is configured", | ||
| Tags: []string{"kubernetes", "observability", "metrics", "llm", "pricing"}, | ||
| Fn: testSessionPricingChatCompletions, | ||
| }) | ||
| pkgtestcases.Register("session-pricing-response-api", pkgtestcases.TestCase{ | ||
| Description: "After a routed Response API call, Prometheus exposes llm_session_turn_cost_usd histogram when model pricing is configured", | ||
| Tags: []string{"kubernetes", "observability", "metrics", "llm", "pricing", "response-api"}, | ||
| Fn: testSessionPricingResponseAPI, | ||
| }) |
There was a problem hiding this comment.
These new test cases are registered, but they are not referenced by any testmatrix group or profile GetTestCases() list (search shows the names only appear in this file). As a result, they likely won’t run in CI unless manually selected. If these are meant to provide durable E2E coverage for session cost telemetry, add them to an appropriate testmatrix group/profile so the harness executes them by default.
d0a806a to
9660b78
Compare
…ost onto session log Signed-off-by: Huamin Chen <huaminchen@microsoft.com>
9660b78 to
c3b9b3c
Compare

FIX #1742
Purpose
Router/CLI/Dashboard/Operator/Fleet-Sim/Bindings/Training/E2E/Docs/CI/BuildTest Plan
Test Result
Semantic Router PR Checklist
[Router],[CLI],[Dashboard],[Operator],[Fleet-Sim],[Bindings],[Training],[E2E],[Docs], or[CI/Build]git commit -sSee CONTRIBUTING.md for the full contributor workflow and commit guidance.