Skip to content

Ci/test v2 with fixes#4

Open
ev-shindin wants to merge 2 commits intomainfrom
ci/test-v2-with-fixes
Open

Ci/test v2 with fixes#4
ev-shindin wants to merge 2 commits intomainfrom
ci/test-v2-with-fixes

Conversation

@ev-shindin
Copy link
Copy Markdown
Owner

test

parseSaturationConfig() called Validate() without first calling
ApplyDefaults(), causing V2 configs with analyzerName: saturation
to fail validation because scaleUpThreshold/scaleDownBoundary
default to zero (omitempty) and Validate() rejects zero values.

This caused the engine to skip all models with "Saturation scaling
config not loaded yet for namespace", resulting in no scaling decisions.
When the model server does not emit the vllm:cache_config_info metric
(e.g., llm-d-inference-sim), TotalKvCapacityTokens is 0 and the V2
analyzer skipped the replica entirely, resulting in totalDemand=0 and
no scale-up decisions.

Add computeReplicaCapacityFallback that uses the deployment-derived
capacity from the capacity store and estimates demand from KvCacheUsage
percentage. This allows V2 to produce scaling decisions with any
vLLM-compatible server, not just those emitting cache_config_info.
@ev-shindin
Copy link
Copy Markdown
Owner Author

/ok-to-test

@github-actions
Copy link
Copy Markdown

🚀 Kind E2E (full V1+V2) triggered by /ok-to-test

View the Kind E2E workflow run

@github-actions
Copy link
Copy Markdown

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

@github-actions
Copy link
Copy Markdown

This PR is marked as stale after 21d of inactivity. After an additional 14d of inactivity (7d to become rotten, then 7d more), it will be closed. To prevent this PR from being closed, add a comment or remove the lifecycle/stale label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant