[CI] Modify nightly test case configuration by chen-commits · Pull Request #9463 · vllm-project/vllm-ascend

chen-commits · 2026-05-22T08:35:02Z

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.20.2
vLLM main: vllm-project/vllm@0d4d334

Signed-off-by: chen-commits <1636718796@qq.com> Signed-off-by: chen <1636718796@qq.com>

gemini-code-assist · 2026-05-22T08:35:12Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request updates the nightly end-to-end test configurations for the Qwen3-235B model series. By adjusting the compilation settings, specifically targeting CUDA graph modes, the changes aim to improve the stability and performance metrics captured during automated nightly testing cycles.

Highlights

Nightly Test Configuration Update: Updated nightly test configurations for Qwen3-235B models to include specific compilation configurations for CUDA graphs.
CUDA Graph Optimization: Applied 'FULL_DECODE_ONLY' mode for A22B-A2 and Mooncake-Layerwise variants, and 'PIECEWISE' mode for the standard Qwen3-235B-A22B configuration.

New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-05-22T08:35:26Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request updates the nightly test configurations for the Qwen3-235B model by adding compilation configurations for different execution modes. The reviewer identified that the value "FULL_DECODE_ONLY" is likely incorrect and should be replaced with "FULL" to match the expected enum. Additionally, a conflict was noted in one configuration where the --enforce-eager flag would disable the intended graph capturing. Suggestions were also provided to update the PR title and summary according to the repository's style guide.

gemini-code-assist · 2026-05-22T08:36:48Z

        --trust-remote-code
        --gpu-memory-utilization 0.9
        --async-scheduling
+        --compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}'


Suggested PR Title:

[CI][Misc] Modify nightly test case configuration for Qwen3-235B

Suggested PR Summary:

### What this PR does / why we need it? This PR updates the nightly test configurations for the Qwen3-235B model by adding specific compilation configurations (`cudagraph_mode`). This ensures that the nightly tests cover different execution modes (FULL and PIECEWISE) for better performance and stability validation on Ascend NPUs. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This is a configuration change for nightly tests.

References

The PR title and summary must follow the specified format and be provided in markdown code blocks. ^(link)

gemini-code-assist · 2026-05-22T08:36:49Z

        --trust-remote-code
        --gpu-memory-utilization 0.9
        --async-scheduling
+        --compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}'


The cudagraph_mode value "FULL_DECODE_ONLY" appears to be incorrect. The CUDAGraphMode enum (referenced in vllm_ascend/compilation/acl_graph.py) typically supports "FULL" or "PIECEWISE". Using an unsupported string will likely cause a configuration error or skip graph capture entirely. Please use "FULL" if full graph capture is intended.

--compilation-config '{"cudagraph_mode":"FULL"}'

gemini-code-assist · 2026-05-22T08:36:49Z

        --trust-remote-code
        --gpu-memory-utilization 0.9
        --async-scheduling
+        --compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}'


The cudagraph_mode value "FULL_DECODE_ONLY" should be changed to "FULL" to match the expected CUDAGraphMode enum values.

--compilation-config '{"cudagraph_mode":"FULL"}'

gemini-code-assist · 2026-05-22T08:36:49Z

        --trust-remote-code
        --no-enable-prefix-caching
        --gpu-memory-utilization 0.9
+        --compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}'


This server command includes --enforce-eager (line 32), which explicitly disables graph capturing. Adding --compilation-config with a cudagraph_mode here is contradictory and will have no effect. If the goal is to test ACL graphs for this component, --enforce-eager should be removed. Additionally, "FULL_DECODE_ONLY" should be corrected to "FULL".

--compilation-config '{"cudagraph_mode":"FULL"}'

gemini-code-assist · 2026-05-22T08:36:49Z

        --trust-remote-code
        --no-enable-prefix-caching
        --gpu-memory-utilization 0.9
+        --compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}'


The cudagraph_mode value "FULL_DECODE_ONLY" should be changed to "FULL" to match the expected CUDAGraphMode enum values.

--compilation-config '{"cudagraph_mode":"FULL"}'

github-actions · 2026-05-28T02:12:42Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

[CI] Modify nightly test case configuration

6b395e1

Signed-off-by: chen-commits <1636718796@qq.com> Signed-off-by: chen <1636718796@qq.com>

chen-commits requested a review from wangxiyuan as a code owner May 22, 2026 08:35

github-actions Bot added the module:tests label May 22, 2026

gemini-code-assist Bot reviewed May 22, 2026

View reviewed changes

github-actions Bot added the merge-conflicts label May 28, 2026

chen-commits closed this Jun 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Modify nightly test case configuration#9463

[CI] Modify nightly test case configuration#9463
chen-commits wants to merge 1 commit into
vllm-project:mainfrom
chen-commits:main-nightly11

chen-commits commented May 22, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chen-commits commented May 22, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist Bot commented May 22, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

chen-commits commented May 22, 2026 •

edited by github-actions Bot

Loading