[vllm] chore: fix mc2 used in vllm_ascend on A2 npu#5560
[vllm] chore: fix mc2 used in vllm_ascend on A2 npu#5560wucong25 wants to merge 6 commits intoverl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces NPU (Ascend) specific patches for vLLM, implementing wrappers to adjust MoE communication methods and matmul/reduce behavior based on Ascend SoC versions, particularly for A2, and adding a pre-launch check to prevent unsupported configurations. The review identifies several improvement opportunities: a critical issue with a broad AssertionError catch that could mask legitimate bugs, multiple instances of inefficient and potentially error-prone module imports inside wrapper functions, the need to clarify vague comments regarding AscendSocVersion.A2 limitations, and the unnecessary complexity introduced by a nested function.
| moe_comm_method = MoECommType.NAIVE_MULTICAST | ||
|
|
||
| return moe_comm_method | ||
|
|
||
| return wrapper |
There was a problem hiding this comment.
Catching a broad AssertionError and silently passing (except AssertionError: pass) is a critical issue. This can mask legitimate bugs or unexpected conditions that should be handled explicitly or logged. If get_forward_context() is expected to raise an AssertionError under specific, non-critical circumstances, those conditions should be checked explicitly, or a more specific exception should be caught, and at minimum, a warning should be logged. Silently passing can lead to difficult-to-debug issues.
try:
forward_context = get_forward_context()
forward_context.mmrs_fusion = False
except AssertionError as e:
# Log the error or handle it more specifically if it's an expected condition.
# For example, if forward_context is not available in certain setups.
# logging.warning(f"Could not set mmrs_fusion: {e}")
pass| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and |
There was a problem hiding this comment.
The comment "AscendSocVersion.A2 is not support MC2 in Single-card multi-process scenario now." is vague. "now" implies a temporary state, but it's better to state the current limitation clearly without temporal ambiguity. Please clarify if this is a known, permanent limitation or if there's a specific version or condition under which it might change.
verl/utils/vllm/npu_vllm_patch.py
Outdated
| from vllm_ascend.ascend_forward_context import MoECommType | ||
| from vllm_ascend.utils import get_ascend_soc_version, AscendSocVersion |
|
|
||
| if with_prefill: |
|
|
||
| if with_prefill: | ||
| from vllm_ascend.utils import enable_sp | ||
| if enable_sp(): |
| from vllm_ascend.utils import enable_sp | ||
| if enable_sp(): | ||
| moe_comm_method = MoECommType.ALLGATHER | ||
| else: |
| from vllm.forward_context import get_forward_context | ||
| try: | ||
| forward_context = get_forward_context() | ||
| forward_context.mmrs_fusion = False | ||
| except AssertionError: | ||
| # forward_context.mmrs_fusion will be false in matmul_and_reduce func. | ||
| pass | ||
| return fn(self, *args, **kwargs) | ||
|
|
There was a problem hiding this comment.
The nested function get_ascend_soc_version_local() is unnecessary. Its logic can be directly integrated into check_vllm_ascend_before_server_launch() or get_ascend_soc_version from vllm_ascend.utils could be used if it provides the same functionality. Defining functions within functions adds complexity without clear benefit here.
soc_version_raw = torch_npu.npu.get_soc_version()
if 220 <= soc_version_raw <= 225:
soc_version = AscendSocVersion.A2
elif 250 <= soc_version_raw <= 255:
soc_version = AscendSocVersion.A3
else:
soc_version = AscendSocVersion.UNDEFINED
What does this PR do?
fix mc2 used in vllm_ascend on A2 npu
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,veomni,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,fully_async,one_step_off,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)recipesubmodule, please also update the reference to the submodule commit viagit submodule update --remoteorcd recipe && git pull origin main.