Skip to content

Question: How does WVA map a vLLM deployment to its EPP for SLO-based scaling? #824

@vivekk16

Description

@vivekk16

Context

We are integrating WVA with KServe. We have seen mentions of SLO-based scaling being introduced in a future release and want to ensure KServe is ready for it when it lands.

Questions

  1. How does WVA pick EPP metrics for SLO-based scaling? Which metrics are used and where are they fetched from?

  2. Can you point us to the logic that maps a vLLM deployment (the scale target) to its corresponding EPP? We want to understand the full chain so we can verify it works correctly in a KServe environment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageIndicates an issue or PR lacks a triage label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions