You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: V2 analyzer fallback when vllm:cache_config_info is absent
When the model server does not emit the vllm:cache_config_info metric
(e.g., llm-d-inference-sim), TotalKvCapacityTokens is 0 and the V2
analyzer skipped the replica entirely, resulting in totalDemand=0 and
no scale-up decisions.
Add computeReplicaCapacityFallback that uses the deployment-derived
capacity from the capacity store and estimates demand from KvCacheUsage
percentage. This allows V2 to produce scaling decisions with any
vLLM-compatible server, not just those emitting cache_config_info.
0 commit comments