Context
We are integrating WVA with KServe. We have seen mentions of SLO-based scaling being introduced in a future release and want to ensure KServe is ready for it when it lands.
Questions
-
How does WVA pick EPP metrics for SLO-based scaling? Which metrics are used and where are they fetched from?
-
Can you point us to the logic that maps a vLLM deployment (the scale target) to its corresponding EPP? We want to understand the full chain so we can verify it works correctly in a KServe environment.
Context
We are integrating WVA with KServe. We have seen mentions of SLO-based scaling being introduced in a future release and want to ensure KServe is ready for it when it lands.
Questions
How does WVA pick EPP metrics for SLO-based scaling? Which metrics are used and where are they fetched from?
Can you point us to the logic that maps a vLLM deployment (the scale target) to its corresponding EPP? We want to understand the full chain so we can verify it works correctly in a KServe environment.