Skip to content

[CI]repair custom ops ci#9465

Merged
MengqingCao merged 8 commits into
vllm-project:mainfrom
ZT-AIA:0521
May 25, 2026
Merged

[CI]repair custom ops ci#9465
MengqingCao merged 8 commits into
vllm-project:mainfrom
ZT-AIA:0521

Conversation

@ZT-AIA

@ZT-AIA ZT-AIA commented May 22, 2026

Copy link
Copy Markdown
Collaborator

What this PR does / why we need it?

Fix the nightly custom ops test cases; this is mainly caused by changes in vllm and inherent defects in the test cases themselves.

Does this PR introduce any user-facing change?

no

How was this patch tested?

nightly custom op

Signed-off-by: ZT-AIA <1028681969@qq.com>
@github-actions

Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@github-actions

Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses stability issues in the nightly custom operations CI pipeline. It updates various test configurations to ensure proper environment initialization and aligns test logic with recent changes in the vllm codebase. Additionally, it optimizes test coverage for specific operations to improve execution efficiency.

Highlights

  • CI Stability Improvements: Updated multiple test files to ensure consistent initialization of device properties and custom operations, resolving failures in the nightly CI pipeline.
  • Test Logic Alignment: Adjusted test mocks and logic to align with recent upstream vllm changes, including updates to fused operations and rejection sampling.
  • Test Coverage Optimization: Reduced redundant test parameters in specific operations like RoPE and split QKV to improve CI execution efficiency and stability.
New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Suggested PR Title:

[Ops][Misc] Refactor Triton operation tests and simplify logprob computation

Suggested PR Summary:

### What this PR does / why we need it?
This pull request refactors and optimizes several NPU-specific operation tests and simplifies the logprob computation logic. It introduces necessary initialization calls (`init_device_properties_triton`, `enable_custom_op`) across various test files and streamlines the `compute_topk_logprobs` signature by removing unused parameters. Additionally, it adjusts test parametrizations for performance and updates rejection sampling kernels.

Feedback from the review highlights a redundant class redefinition in `test_fused_moe.py` that shadows an import. It also points out that the newly added `num_rejected_tokens` buffer in `test_prepare_inputs_padded.py` is currently unverified, suggesting an assertion against the reference implementation is needed.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Tested via the updated nightly E2E test suite for single-node operations.

Comment on lines +51 to +55
class SiluAndMul:
"""SwiGLU activation function: silu(x[:d]) * x[d:] where d = x.shape[-1] // 2"""
def __call__(self, x: torch.Tensor) -> torch.Tensor:
d = x.shape[-1] // 2
return F.silu(x[..., :d]) * x[..., d:]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The local redefinition of SiluAndMul shadows the import from vllm.model_executor.layers.activation at line 30. This is redundant and can lead to confusion for maintainers. If the local implementation is required due to changes in vLLM, the unused import should be removed to maintain code clarity. Otherwise, consider using the imported class directly.

# Run Triton kernel
out_tri = torch.empty(num_reqs, dtype=torch.int32, device=device)

num_rejected_tokens = torch.empty(num_reqs, dtype=torch.int32, device=device)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The num_rejected_tokens buffer is initialized and passed to the kernel, but its output is never verified. Since the reference implementation prepare_inputs_padded_ref already calculates this value (line 24), the test should be updated to assert that the kernel's output matches the reference. This ensures the kernel's logic for calculating rejected tokens is correctly verified.

Signed-off-by: ZT-AIA <63220130+ZT-AIA@users.noreply.github.com>
ZT-AIA and others added 4 commits May 23, 2026 19:13
Signed-off-by: ZT-AIA <1028681969@qq.com>
Signed-off-by: ZT-AIA <1028681969@qq.com>
@github-actions

Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: ZT-AIA <63220130+ZT-AIA@users.noreply.github.com>
torch.testing.assert_close(y_ref, y_cal, rtol=3e-03, atol=1e-02, equal_nan=True)


@pytest.mark.skip(reason="Tested separately on a 310P machine.")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@pytest.mark.skip(reason="Tested separately on a 310P machine.")
@pytest.mark.skipif(not is_310p_hw(), reason="Tested separately on a 310P machine.")

Signed-off-by: ZT-AIA <1028681969@qq.com>

@MengqingCao MengqingCao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, thx!

@MengqingCao MengqingCao merged commit f650855 into vllm-project:main May 25, 2026
54 checks passed
yilunh998 pushed a commit to yilunh998/vllm-ascend that referenced this pull request Jun 2, 2026
### What this PR does / why we need it?
Fix the nightly custom ops test cases; this is mainly caused by changes
in vllm and inherent defects in the test cases themselves.

- vLLM version: v0.20.2
- vLLM main:
vllm-project/vllm@1ac10f1
---------
Signed-off-by: ZT-AIA <1028681969@qq.com>
Signed-off-by: ZT-AIA <63220130+ZT-AIA@users.noreply.github.com>
Signed-off-by: yilunh <hanyilun1@huawei.com>
LostFox11 pushed a commit to LostFox11/vllm-ascend that referenced this pull request Jun 15, 2026
### What this PR does / why we need it?
Fix the nightly custom ops test cases; this is mainly caused by changes
in vllm and inherent defects in the test cases themselves.

- vLLM version: v0.20.2
- vLLM main:
vllm-project/vllm@1ac10f1
---------
Signed-off-by: ZT-AIA <1028681969@qq.com>
Signed-off-by: ZT-AIA <63220130+ZT-AIA@users.noreply.github.com>
LostFox11 pushed a commit to LostFox11/vllm-ascend that referenced this pull request Jun 15, 2026
### What this PR does / why we need it?
Fix the nightly custom ops test cases; this is mainly caused by changes
in vllm and inherent defects in the test cases themselves.

- vLLM version: v0.20.2
- vLLM main:
vllm-project/vllm@1ac10f1
---------
Signed-off-by: ZT-AIA <1028681969@qq.com>
Signed-off-by: ZT-AIA <63220130+ZT-AIA@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants