Skip to content

[vllm] chore: fix mc2 used in vllm_ascend on A2 npu#5560

Open
wucong25 wants to merge 6 commits intoverl-project:mainfrom
wucong25:wc/fix_a2_mc2
Open

[vllm] chore: fix mc2 used in vllm_ascend on A2 npu#5560
wucong25 wants to merge 6 commits intoverl-project:mainfrom
wucong25:wc/fix_a2_mc2

Conversation

@wucong25
Copy link
Collaborator

What does this PR do?

fix mc2 used in vllm_ascend on A2 npu

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, fully_async, one_step_off
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

@wucong25 wucong25 changed the title [fix] chore: fix mc2 used in vllm_ascend on A2 npu [vllm] chore: fix mc2 used in vllm_ascend on A2 npu Mar 11, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces NPU (Ascend) specific patches for vLLM, implementing wrappers to adjust MoE communication methods and matmul/reduce behavior based on Ascend SoC versions, particularly for A2, and adding a pre-launch check to prevent unsupported configurations. The review identifies several improvement opportunities: a critical issue with a broad AssertionError catch that could mask legitimate bugs, multiple instances of inefficient and potentially error-prone module imports inside wrapper functions, the need to clarify vague comments regarding AscendSocVersion.A2 limitations, and the unnecessary complexity introduced by a nested function.

Comment on lines +44 to +48
moe_comm_method = MoECommType.NAIVE_MULTICAST

return moe_comm_method

return wrapper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Catching a broad AssertionError and silently passing (except AssertionError: pass) is a critical issue. This can mask legitimate bugs or unexpected conditions that should be handled explicitly or logged. If get_forward_context() is expected to raise an AssertionError under specific, non-critical circumstances, those conditions should be checked explicitly, or a more specific exception should be caught, and at minimum, a warning should be logged. Silently passing can lead to difficult-to-debug issues.

            try:
                forward_context = get_forward_context()
                forward_context.mmrs_fusion = False
            except AssertionError as e:
                # Log the error or handle it more specifically if it's an expected condition.
                # For example, if forward_context is not available in certain setups.
                # logging.warning(f"Could not set mmrs_fusion: {e}")
                pass

# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The comment "AscendSocVersion.A2 is not support MC2 in Single-card multi-process scenario now." is vague. "now" implies a temporary state, but it's better to state the current limitation clearly without temporal ambiguity. Please clarify if this is a known, permanent limitation or if there's a specific version or condition under which it might change.

Comment on lines +25 to +26
from vllm_ascend.ascend_forward_context import MoECommType
from vllm_ascend.utils import get_ascend_soc_version, AscendSocVersion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to previous comments, importing enable_sp inside the wrapper function is inefficient and can hide import errors. It should be imported at the top of the file.

from vllm_ascend.utils import enable_sp

Comment on lines +38 to +39

if with_prefill:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Importing modules (get_ascend_soc_version, AscendSocVersion) inside this wrapper function is inefficient and can hide import errors. These should be imported at the top of the file.

from vllm_ascend.utils import get_ascend_soc_version, AscendSocVersion


if with_prefill:
from vllm_ascend.utils import enable_sp
if enable_sp():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The comment "AscendSocVersion.A2 is not support MC2 in Single-card multi-process scenario now." is repeated and still vague. Please clarify the specific and current limitation without temporal ambiguity.

from vllm_ascend.utils import enable_sp
if enable_sp():
moe_comm_method = MoECommType.ALLGATHER
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Importing get_forward_context inside the wrapper function is inefficient and can hide import errors. It should be imported at the top of the file.

from vllm.forward_context import get_forward_context

Comment on lines +57 to +65
from vllm.forward_context import get_forward_context
try:
forward_context = get_forward_context()
forward_context.mmrs_fusion = False
except AssertionError:
# forward_context.mmrs_fusion will be false in matmul_and_reduce func.
pass
return fn(self, *args, **kwargs)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The nested function get_ascend_soc_version_local() is unnecessary. Its logic can be directly integrated into check_vllm_ascend_before_server_launch() or get_ascend_soc_version from vllm_ascend.utils could be used if it provides the same functionality. Defining functions within functions adds complexity without clear benefit here.

    soc_version_raw = torch_npu.npu.get_soc_version()
    if 220 <= soc_version_raw <= 225:
        soc_version = AscendSocVersion.A2
    elif 250 <= soc_version_raw <= 255:
        soc_version = AscendSocVersion.A3
    else:
        soc_version = AscendSocVersion.UNDEFINED

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant