Skip to content

Commit 95e6400

Browse files
[KVPool]Fix PP get bug (#5007)
### What this PR does / why we need it? When kv caches are evicted from the key-value pool, it's possible that the kv cache for pp0 is still active, but the kv cache for pp1 has already been evicted. Therefore, a unified check is needed during the get operation. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: baxingpiaochong <[email protected]> Co-authored-by: Jade Zheng <[email protected]>
1 parent a5cb8e4 commit 95e6400

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

vllm_ascend/distributed/kvpool/pool_worker.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -572,7 +572,8 @@ def lookup_scheduler(
572572
num_block = len(keys) // self.num_layers
573573
multi_tp_values = [
574574
res[i * num_block:(i + 1) * num_block] # type: ignore[index]
575-
for i in range(min(self.tp_size, self.num_kv_head))
575+
for i in range(
576+
min(self.tp_size, self.num_kv_head) * self.pp_size)
576577
]
577578
index = self.find_min_first_non_one_index(multi_tp_values)
578579
if index != -1:

0 commit comments

Comments
 (0)