Skip to content

Ensure DP-aware routing works with KVEvents/indexing #357

@vMaroon

Description

@vMaroon

Summary

In DP deployments (data-parallel), different ranks maintain their own KV-cache. Ranks could be deployed in separate vLLM processes (different API servers/frontends), or within the same vLLM frontend.

KVEvents have a data_parallel_rank field that is sent in every message but is currently ignored. While this makes sense in the latter kind of deployments, within the first, the DP rank can be assigned on the connection/subscription level.

This gap should be closed in accommodation with the DP-aware wide-ep work.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions