Skip to content

[CI] Modify nightly test case configuration#9463

Closed
chen-commits wants to merge 1 commit into
vllm-project:mainfrom
chen-commits:main-nightly11
Closed

[CI] Modify nightly test case configuration#9463
chen-commits wants to merge 1 commit into
vllm-project:mainfrom
chen-commits:main-nightly11

Conversation

@chen-commits

@chen-commits chen-commits commented May 22, 2026

Copy link
Copy Markdown
Contributor

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: chen-commits <1636718796@qq.com>
Signed-off-by: chen <1636718796@qq.com>
@chen-commits chen-commits requested a review from wangxiyuan as a code owner May 22, 2026 08:35
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request updates the nightly end-to-end test configurations for the Qwen3-235B model series. By adjusting the compilation settings, specifically targeting CUDA graph modes, the changes aim to improve the stability and performance metrics captured during automated nightly testing cycles.

Highlights

  • Nightly Test Configuration Update: Updated nightly test configurations for Qwen3-235B models to include specific compilation configurations for CUDA graphs.
  • CUDA Graph Optimization: Applied 'FULL_DECODE_ONLY' mode for A22B-A2 and Mooncake-Layerwise variants, and 'PIECEWISE' mode for the standard Qwen3-235B-A22B configuration.
New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions

Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the nightly test configurations for the Qwen3-235B model by adding compilation configurations for different execution modes. The reviewer identified that the value "FULL_DECODE_ONLY" is likely incorrect and should be replaced with "FULL" to match the expected enum. Additionally, a conflict was noted in one configuration where the --enforce-eager flag would disable the intended graph capturing. Suggestions were also provided to update the PR title and summary according to the repository's style guide.

--trust-remote-code
--gpu-memory-utilization 0.9
--async-scheduling
--compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Suggested PR Title:

[CI][Misc] Modify nightly test case configuration for Qwen3-235B

Suggested PR Summary:

### What this PR does / why we need it?
This PR updates the nightly test configurations for the Qwen3-235B model by adding specific compilation configurations (`cudagraph_mode`). This ensures that the nightly tests cover different execution modes (FULL and PIECEWISE) for better performance and stability validation on Ascend NPUs.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
This is a configuration change for nightly tests.
References
  1. The PR title and summary must follow the specified format and be provided in markdown code blocks. (link)

--trust-remote-code
--gpu-memory-utilization 0.9
--async-scheduling
--compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The cudagraph_mode value "FULL_DECODE_ONLY" appears to be incorrect. The CUDAGraphMode enum (referenced in vllm_ascend/compilation/acl_graph.py) typically supports "FULL" or "PIECEWISE". Using an unsupported string will likely cause a configuration error or skip graph capture entirely. Please use "FULL" if full graph capture is intended.

        --compilation-config '{"cudagraph_mode":"FULL"}'

--trust-remote-code
--gpu-memory-utilization 0.9
--async-scheduling
--compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The cudagraph_mode value "FULL_DECODE_ONLY" should be changed to "FULL" to match the expected CUDAGraphMode enum values.

        --compilation-config '{"cudagraph_mode":"FULL"}'

--trust-remote-code
--no-enable-prefix-caching
--gpu-memory-utilization 0.9
--compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This server command includes --enforce-eager (line 32), which explicitly disables graph capturing. Adding --compilation-config with a cudagraph_mode here is contradictory and will have no effect. If the goal is to test ACL graphs for this component, --enforce-eager should be removed. Additionally, "FULL_DECODE_ONLY" should be corrected to "FULL".

        --compilation-config '{"cudagraph_mode":"FULL"}'

--trust-remote-code
--no-enable-prefix-caching
--gpu-memory-utilization 0.9
--compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The cudagraph_mode value "FULL_DECODE_ONLY" should be changed to "FULL" to match the expected CUDAGraphMode enum values.

        --compilation-config '{"cudagraph_mode":"FULL"}'

@github-actions

Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant