Commit 0f5660d

and

authored

[Feature] [P/D] support hybrid attention for mooncake connector (vllm-project#8850)

### What this PR does / why we need it? This PR adapts `MooncakeConnector` KV transfer metadata handling for hybrid KV cache layouts. The core change is to make Mooncake KV transfer operate on KV-cache-group-aware metadata instead of assuming a single uniform attention-only KV layout. In `register_kv_caches`, this PR builds and records per-group/per-layer metadata used by the sender and receiver: - `self.kv_group2layeridx`: maps each KV cache group to its serialized group spec and physical layer indices, e.g. `{group_id: (group_spec, [layer_idx0, layer_idx1, ...])}`. - `self.block_size_scale`: stores per-layer cache block scaling, e.g. `[layer_idx][cache_idx] -> cache tensor num_blocks / logical num_blocks`. - `self.block_len_per_addr`: stores per-layer byte length for each cache tensor block, e.g. `[layer_idx][cache_idx] -> cache block byte length`. - `self.kv_caches_base_addr`: stores per-layer base addresses for each registered cache tensor, e.g. `[layer_idx][cache_idx] -> data_ptr`. Based on this metadata, `_get_kv_split_metadata` now prepares transfer splits that can represent hybrid KV cache groups, including non-uniform group layouts. `_get_group_pulls_metadata` then builds per-remote-port group pull descriptors so each transfer task knows which KV cache group, remote TP offset, and prefill PP rank it should pull from. This allows Mooncake connector to support hybrid KV cache transfer paths while keeping the existing non-hybrid behavior compatible. - vLLM version: v0.20.2 - vLLM main: vllm-project/vllm@39910f2 --------- Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: zzzzzmeng <810924837@qq.com> Signed-off-by: liziyu179 <liziyu16@huawei.com> Co-authored-by: zzzzzmeng <810924837@qq.com>

1 parent 7ba2934 commit 0f5660dCopy full SHA for 0f5660d

2 files changed

tests/ut/kv_connector
- test_mooncake_connector.py
vllm_ascend/distributed/kv_transfer/kv_p2p
- mooncake_connector.py

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit 0f5660d

File tree

0 commit comments