[vllm] chore: fix mc2 used in vllm_ascend on A2 npu#5560

Open

wucong25 wants to merge 6 commits intoverl-project:mainfrom

wucong25:wc/fix_a2_mc2

Collaborator

wucong25 commented Mar 11, 2026

What does this PR do?

fix mc2 used in vllm_ascend on A2 npu

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, fully_async, one_step_off
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)
If your PR is related to the recipe submodule, please also update the reference to the submodule commit via git submodule update --remote or cd recipe && git pull origin main.

wucong25 added 2 commits

March 11, 2026 20:08


          fix mc2 used in vllm_ascend on A2

c309b67


          fix mc2 used in vllm_ascend on A2

a417cb7

wucong25 requested review from PeterSH6, chenhaiq and wuxibin89 as code owners

March 11, 2026 12:10

wucong25 changed the title ~~[fix] chore: fix mc2 used in vllm_ascend on A2 npu~~ [vllm] chore: fix mc2 used in vllm_ascend on A2 npu

wucong25 added 2 commits

March 11, 2026 20:22


          add license

b964f78


          fix pre-commit

c6f17d5

gemini-code-assist bot reviewed

View reviewed changes

Contributor

gemini-code-assist bot left a comment

Code Review

This pull request introduces NPU (Ascend) specific patches for vLLM, implementing wrappers to adjust MoE communication methods and matmul/reduce behavior based on Ascend SoC versions, particularly for A2, and adding a pre-launch check to prevent unsupported configurations. The review identifies several improvement opportunities: a critical issue with a broad AssertionError catch that could mask legitimate bugs, multiple instances of inefficient and potentially error-prone module imports inside wrapper functions, the need to clarify vague comments regarding AscendSocVersion.A2 limitations, and the unnecessary complexity introduced by a nested function.

verl/utils/vllm/npu_vllm_patch.py

Comment on lines +44 to +48

+                              moe_comm_method = MoECommType.NAIVE_MULTICAST
+                      return moe_comm_method
+                  return wrapper

Contributor

gemini-code-assist bot Mar 11, 2026

Catching a broad AssertionError and silently passing (except AssertionError: pass) is a critical issue. This can mask legitimate bugs or unexpected conditions that should be handled explicitly or logged. If get_forward_context() is expected to raise an AssertionError under specific, non-critical circumstances, those conditions should be checked explicitly, or a more specific exception should be caught, and at minimum, a warning should be logged. Silently passing can lead to difficult-to-debug issues.

            try:
                forward_context = get_forward_context()
                forward_context.mmrs_fusion = False
            except AssertionError as e:
                # Log the error or handle it more specifically if it's an expected condition.
                # For example, if forward_context is not available in certain setups.
                # logging.warning(f"Could not set mmrs_fusion: {e}")
                pass

verl/utils/vllm/npu_vllm_patch.py

+              # Unless required by applicable law or agreed to in writing, software
+              # distributed under the License is distributed on an "AS IS" BASIS,
+              # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+              # See the License for the specific language governing permissions and

Contributor

gemini-code-assist bot Mar 11, 2026

The comment "AscendSocVersion.A2 is not support MC2 in Single-card multi-process scenario now." is vague. "now" implies a temporary state, but it's better to state the current limitation clearly without temporal ambiguity. Please clarify if this is a known, permanent limitation or if there's a specific version or condition under which it might change.

verl/utils/vllm/npu_vllm_patch.py Outdated

Comment on lines +25 to +26

		from vllm_ascend.ascend_forward_context import MoECommType
		from vllm_ascend.utils import get_ascend_soc_version, AscendSocVersion

Contributor

gemini-code-assist bot Mar 11, 2026

Similar to previous comments, importing enable_sp inside the wrapper function is inefficient and can hide import errors. It should be imported at the top of the file.

from vllm_ascend.utils import enable_sp

verl/utils/vllm/npu_vllm_patch.py

Comment on lines +38 to +39


		if with_prefill:

Contributor

gemini-code-assist bot Mar 11, 2026

Importing modules (get_ascend_soc_version, AscendSocVersion) inside this wrapper function is inefficient and can hide import errors. These should be imported at the top of the file.

from vllm_ascend.utils import get_ascend_soc_version, AscendSocVersion

verl/utils/vllm/npu_vllm_patch.py

+                      if with_prefill:
+                          from vllm_ascend.utils import enable_sp
+                          if enable_sp():

Contributor

gemini-code-assist bot Mar 11, 2026

The comment "AscendSocVersion.A2 is not support MC2 in Single-card multi-process scenario now." is repeated and still vague. Please clarify the specific and current limitation without temporal ambiguity.

verl/utils/vllm/npu_vllm_patch.py

+                          from vllm_ascend.utils import enable_sp
+                          if enable_sp():
+                              moe_comm_method = MoECommType.ALLGATHER
+                          else:

Contributor

gemini-code-assist bot Mar 11, 2026

Importing get_forward_context inside the wrapper function is inefficient and can hide import errors. It should be imported at the top of the file.

from vllm.forward_context import get_forward_context

verl/utils/vllm/npu_vllm_patch.py

Comment on lines +57 to +65

+                          from vllm.forward_context import get_forward_context
+                          try:
+                              forward_context = get_forward_context()
+                              forward_context.mmrs_fusion = False
+                          except AssertionError:
+                              # forward_context.mmrs_fusion will be false in matmul_and_reduce func.
+                              pass
+                      return fn(self, *args, **kwargs)

Contributor

gemini-code-assist bot Mar 11, 2026

The nested function get_ascend_soc_version_local() is unnecessary. Its logic can be directly integrated into check_vllm_ascend_before_server_launch() or get_ascend_soc_version from vllm_ascend.utils could be used if it provides the same functionality. Defining functions within functions adds complexity without clear benefit here.

    soc_version_raw = torch_npu.npu.get_soc_version()
    if 220 <= soc_version_raw <= 225:
        soc_version = AscendSocVersion.A2
    elif 250 <= soc_version_raw <= 255:
        soc_version = AscendSocVersion.A3
    else:
        soc_version = AscendSocVersion.UNDEFINED

wucong25 added 2 commits

March 14, 2026 16:28


          fix mc2 used in vllm_ascend on A2


          fix precommits

22838f3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

wuxibin89 Awaiting requested review from wuxibin89 wuxibin89 is a code owner

PeterSH6 Awaiting requested review from PeterSH6 PeterSH6 is a code owner

chenhaiq Awaiting requested review from chenhaiq chenhaiq is a code owner

1 more reviewer

gemini-code-assist[bot] gemini-code-assist[bot] left review comments

At least 1 approving review is required to merge this pull request.

Labels

None yet