Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion vllm_ascend/distributed/kvpool/pool_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -572,7 +572,8 @@ def lookup_scheduler(
num_block = len(keys) // self.num_layers
multi_tp_values = [
res[i * num_block:(i + 1) * num_block] # type: ignore[index]
for i in range(min(self.tp_size, self.num_kv_head))
for i in range(
min(self.tp_size, self.num_kv_head) * self.pp_size)
]
Comment on lines +575 to 577
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This change is likely to cause an IndexError when both tensor parallelism and pipeline parallelism are enabled (i.e., self.tp_size > 1 and self.pp_size > 1).

The res list's size is determined by multi_tp_keys. The construction of multi_tp_keys on lines 554-565 does not generate all combinations of TP and PP ranks. It only generates keys for (tp_rank=any, pp_rank=0) and (tp_rank=0, pp_rank>0), missing cross-combinations.

As a result, len(res) will be num_block * (tp_factor + self.pp_size - 1), where tp_factor = min(self.tp_size, self.num_kv_head). This loop, however, attempts to access up to num_block * tp_factor * self.pp_size elements. An IndexError will occur if tp_factor * self.pp_size > tp_factor + self.pp_size - 1, which is true when tp_factor > 1 and self.pp_size > 1.

For this unified check to work correctly, the multi_tp_keys list must be populated with keys for all combinations of TP and PP ranks. The logic for generating multi_tp_keys needs to be corrected first.

index = self.find_min_first_non_one_index(multi_tp_values)
if index != -1:
Expand Down
Loading