Saturation v2 fails to scale inference servers as it silently fails to get total demand by asm582 · Pull Request #982 · llm-d/llm-d-workload-variant-autoscaler

asm582 · 2026-04-07T01:49:49Z

The EPP issue mentions adding a key namespace to metrics made available from inference servers in a different namespace.

The code change we applied previously acts as a graceful degradation safety net by leveraging PromQL's or logic to intentionally bypass the namespace requirement when necessary. In this fallback layer, we completely stripe out the namespace= filter and remove namespace from the sum by() aggregation grouping.

Note: Because this fallback logic explicitly bypasses strict physical namespace tenant isolation on Prometheus, it requires the upstream model_name or target_model_name (e.g., InferencePool name or HuggingFace identifier string) to be absolutely unique across the entire Kubernetes cluster. If two identical model strings are deployed in separate namespaces and the fallback path triggers, the autoscaler will erroneously ingest blended traffic counts resulting in inaccurate scaling math.

lionelvillard · 2026-04-08T17:38:38Z

The namespace label is automatically added by prometheus (see doc).

What error are you getting?

asm582 · 2026-04-08T17:56:27Z

The namespace label is automatically added by prometheus (see doc).

What error are you getting?

Thanks, the error is total demand is always zero and WVA fails to scale.

asm582 · 2026-04-08T17:56:34Z

/hold

asm582 · 2026-04-08T21:55:10Z

I did a clean deploy of the controller, EPP all in the same namespace, and see the below log line:

2026-04-08T21:44:11Z    INFO    saturation/engine_v2.go:65      V2 saturation analysis completed        
{
  "modelID": "Qwen/Qwen3-0.6B", 
  "totalSupply": 6553, 
  "totalDemand": 7319, 
  "utilization": 1.116893, 
  "requiredCapacity": 2595.75, 
  "spareCapacity": 0
}

Below are HPA logs:

Normal   SuccessfulRescale             2m56s                 horizontal-pod-autoscaler  New size: 1; reason: All metrics below target
  Normal   SuccessfulRescale             101s                  horizontal-pod-autoscaler  New size: 2; reason: external metric wva_desired_replicas(&LabelSelector{MatchLabels:map[string]string{controller_instance: asmalvan-test,exported_namespace: asmalvan-test,variant_name: workload-variant-autoscaler-va,},MatchExpressions:[]LabelSelectorRequirement{},}) above target

Closing this PR for now.

satv2 hack fix bypass namespace isolation on prom

91164af

asm582 requested review from dumb0002, ev-shindin and lionelvillard April 7, 2026 01:49

github-actions bot added the hold PRs that are blocked on design, other features, release cycle, etc. label Apr 8, 2026

asm582 closed this Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saturation v2 fails to scale inference servers as it silently fails to get total demand#982

Saturation v2 fails to scale inference servers as it silently fails to get total demand#982
asm582 wants to merge 1 commit intollm-d:mainfrom
asm582:fix-epp-namespace-fallback

asm582 commented Apr 7, 2026

Uh oh!

lionelvillard commented Apr 8, 2026

Uh oh!

asm582 commented Apr 8, 2026

Uh oh!

asm582 commented Apr 8, 2026

Uh oh!

asm582 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

asm582 commented Apr 7, 2026

Uh oh!

lionelvillard commented Apr 8, 2026

Uh oh!

asm582 commented Apr 8, 2026

Uh oh!

asm582 commented Apr 8, 2026

Uh oh!

asm582 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants