Skip to content

Conversation

@zhenwenqi2024
Copy link
Contributor

@zhenwenqi2024 zhenwenqi2024 commented Dec 14, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Prefill Context Parallelism (PCP) and Decode Context Parallelism (DCP) logic by moving it from NPUModelRunner into a new PCPManager class. While this is a good structural improvement, the refactoring has introduced several critical bugs, including typos, incorrect method calls with missing or wrong arguments, and usage of undefined attributes. These issues will lead to runtime errors and must be addressed.

self.dcp_rank = 0
self.pcp_size = 1
self.pcp_rank = 0
max_buffer_num_tokens = self.max_num_token
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There's a typo in the attribute name. self.max_num_token should be self.max_num_tokens. This will cause an AttributeError at runtime as max_num_token is not defined.

Suggested change
max_buffer_num_tokens = self.max_num_token
max_buffer_num_tokens = self.max_num_tokens

Comment on lines +220 to +229
self.pcp_manager = PCPManager(
self.pcp_size,
self.pcp_rank,
self.dcp_size,
self.dcp_rank,
max_buffer_num_tokens,
self.max_num_reqs,
self.device,
self.pin_memory,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The constructor for PCPManager is called with incorrect arguments. The arguments decode_threshold and vllm_config are missing, which will lead to a TypeError at runtime.

            self.pcp_manager = PCPManager(
                self.pcp_size,
                self.pcp_rank,
                self.dcp_size,
                self.dcp_rank,
                max_buffer_num_tokens,
                self.max_num_reqs,
                self.decode_threshold,
                self.device,
                self.vllm_config,
                self.pin_memory,
            )


total_num_pcp_pads = sum(self.num_pcp_pads)
max_num_scheduled_tokens = max(tokens)
if self.pcp__world_szie > 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a typo in the attribute name. self.pcp__world_szie should be self.pcp_size. This will cause an AttributeError at runtime.

        if self.pcp_size > 1:

Comment on lines +744 to +745
discard_request_indices = np.nonzero(
discard_requests_mask.np[:num_reqs])[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The get_discard_request_mask method returns a numpy array, which does not have a .np attribute. Accessing discard_requests_mask.np will raise an AttributeError. You should use the returned numpy array directly.

        discard_request_indices = np.nonzero(discard_requests_mask)[0]

Comment on lines +930 to +933
slot_mapping = self.pcp_manager.get_padded_slot_mapping(
num_tokens,
slot_mapping,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The variable num_tokens is not defined in this scope, which will cause a NameError at runtime. Based on the context, it seems you intended to use total_num_scheduled_tokens.

                    slot_mapping = self.pcp_manager.get_padded_slot_mapping(
                        total_num_scheduled_tokens,
                        slot_mapping,
                    )

return dcp_local_seq_lens

def generate_kv_idx(self, scheduler_output, input_batch):
if not self.pcp_size > 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The attribute self.pcp_size is not defined in this class. This will cause an AttributeError. You probably meant to use self.pcp_world_size.

Suggested change
if not self.pcp_size > 1:
if not self.pcp_world_size > 1:

def generate_kv_idx(self, scheduler_output, input_batch):
if not self.pcp_size > 1:
return
self.cp_kv_recover_idx_for_chunk = [[] for _ in range(self.pcp_size)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The attribute self.pcp_size is not defined in this class. This will cause an AttributeError. You probably meant to use self.pcp_world_size.

Suggested change
self.cp_kv_recover_idx_for_chunk = [[] for _ in range(self.pcp_size)]
self.cp_kv_recover_idx_for_chunk = [[] for _ in range(self.pcp_world_size)]

Comment on lines 403 to 404
is_prefill = input_batch.num_computed_tokens_cpu[
i] < self.input_batch.num_prompt_tokens[i]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The attribute self.input_batch is not defined in PCPManager. You should use the input_batch argument passed to the method.

Suggested change
is_prefill = input_batch.num_computed_tokens_cpu[
i] < self.input_batch.num_prompt_tokens[i]
is_prefill = input_batch.num_computed_tokens_cpu[
i] < input_batch.num_prompt_tokens[i]

kv_req_offset = 0
q_head_chunk_id = self.pcp_world_rank
q_tail_chunk_id = self.pcp_world_size * 2 - 1 - self.pcp_world_rank
for i, seq_len in enumerate(self.query_lens):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The attribute self.query_lens is not defined in PCPManager. It needs to be passed as an argument to this method.

tail_attn_nomask_seqlens = torch.tensor(
[chunk_seqlens, kv_with_q_tail_nomask_seqlens],
dtype=torch.int32)
pcp_prefill_mask = self.attn_mask
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The attribute self.attn_mask is not defined in PCPManager. It needs to be passed as an argument to this method.

Signed-off-by: zhenwenqi2024 <[email protected]>
Signed-off-by: zhenwenqi2024 <[email protected]>
@@ -0,0 +1,592 @@
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
from typing import List
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不单独创建新的pcp_utils文件,采用和GPU的commn_utils方式,创建公共的工具文件

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok,i will chanage it

Signed-off-by: zhenwenqi2024 <[email protected]>
@jianzs
Copy link
Collaborator

jianzs commented Dec 15, 2025

Is it possible to refactor the logic of pcp_dcp in attention so that both pcp and non-pcp paths share as much common processing as possible, minimizing major branching differences?

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

1 similar comment
@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants