Saturation v2 fails to scale inference servers as it silently fails to get total demand#982
Closed
asm582 wants to merge 1 commit intollm-d:mainfrom
Closed
Saturation v2 fails to scale inference servers as it silently fails to get total demand#982asm582 wants to merge 1 commit intollm-d:mainfrom
asm582 wants to merge 1 commit intollm-d:mainfrom
Conversation
Collaborator
|
The What error are you getting? |
Collaborator
Author
Thanks, the error is total demand is always zero and WVA fails to scale. |
Collaborator
Author
|
/hold |
Collaborator
Author
|
I did a clean deploy of the controller, EPP all in the same namespace, and see the below log line: Below are HPA logs: Closing this PR for now. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The EPP issue mentions adding a key namespace to metrics made available from inference servers in a different namespace.
The code change we applied previously acts as a graceful degradation safety net by leveraging PromQL's or logic to intentionally bypass the namespace requirement when necessary. In this fallback layer, we completely stripe out the namespace= filter and remove namespace from the sum by() aggregation grouping.
Note: Because this fallback logic explicitly bypasses strict physical namespace tenant isolation on Prometheus, it requires the upstream model_name or target_model_name (e.g., InferencePool name or HuggingFace identifier string) to be absolutely unique across the entire Kubernetes cluster. If two identical model strings are deployed in separate namespaces and the fallback path triggers, the autoscaler will erroneously ingest blended traffic counts resulting in inaccurate scaling math.