[Perf] Optimize torch.where and vectorize PCP/DCP loops in mla_v1.py #5003

ader47 · 2025-12-14T15:28:05Z

What this PR does / why we need it?

Replace torch.where() with masked_fill_()
Replace nested PCP/DCP Python loops with fully vectorized tensor operations

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

github-actions · 2025-12-14T15:28:13Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces significant performance optimizations to the MLA v1 attention mechanism for Ascend NPUs. The changes primarily focus on replacing torch.where with the more efficient in-place masked_fill_ operation and vectorizing the PCP/DCP logic to eliminate Python loops and list manipulations. The refactoring in _npu_attention_update and _process_attn_out_lse correctly uses vectorized tensor operations, which should result in a noticeable performance improvement. The logic appears sound, and the changes are well-aligned with the goal of optimizing performance. Overall, this is a solid improvement.

github-actions · 2025-12-15T05:05:36Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

MengqingCao · 2025-12-15T09:12:09Z

vllm_ascend/attention/mla_v1.py

-        softmax_lse = torch.where(lse_mask, -torch.inf, softmax_lse)
+    ) -> torch.Tensor:
+        out_lse_mask = decode_meta.batch_seq_mask[:, None, None].bool()
+        attn_output.masked_fill_(out_lse_mask, 0)


Actually masked_fill is implemented by torch.where, I perfer to keep using torch.where. plz refer to https://github.com/pytorch/pytorch/blob/main/torch/_refs/__init__.py#L5935

Signed-off-by: F.Liu <[email protected]>

gemini-code-assist bot reviewed Dec 14, 2025

View reviewed changes

ader47 force-pushed the optimize-mla-cp branch 4 times, most recently from 75b57e3 to 81e1b59 Compare December 15, 2025 01:20

github-actions bot added the merge-conflicts label Dec 15, 2025

MengqingCao reviewed Dec 15, 2025

View reviewed changes

ader47 closed this Dec 16, 2025

ader47 force-pushed the optimize-mla-cp branch from 81e1b59 to d43cabc Compare December 16, 2025 01:09

[Perf] Optimize vectorize PCP/DCP loops in mla_v1.py

4081cd8

Signed-off-by: F.Liu <[email protected]>

ader47 reopened this Dec 16, 2025

github-actions bot removed the merge-conflicts label Dec 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Perf] Optimize torch.where and vectorize PCP/DCP loops in mla_v1.py #5003

[Perf] Optimize torch.where and vectorize PCP/DCP loops in mla_v1.py #5003

ader47 commented Dec 14, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Dec 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

github-actions bot commented Dec 15, 2025

Uh oh!

MengqingCao Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Perf] Optimize torch.where and vectorize PCP/DCP loops in mla_v1.py #5003

Are you sure you want to change the base?

[Perf] Optimize torch.where and vectorize PCP/DCP loops in mla_v1.py #5003

Conversation

ader47 commented Dec 14, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Dec 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

github-actions bot commented Dec 15, 2025

Uh oh!

MengqingCao Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ader47 commented Dec 14, 2025 •

edited by github-actions bot

Loading