[rollout]{feat}Ascend 950 hardware mxfp8 rollout quantization by zhijie-os · Pull Request #5569 · verl-project/verl

zhijie-os · 2026-03-12T07:34:21Z

What does this PR do?

Supporting latest Ascend hardware DV100 and DV120 for MXFP8 quantization.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, fully_async, one_step_off
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

[TODO] tests will be done

API and Usage Example

Simply specify the quantization to ascend to enable MXFP8 on Ascend hardware

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)
If your PR is related to the recipe submodule, please also update the reference to the submodule commit via git submodule update --remote or cd recipe && git pull origin main.

CLAassistant · 2026-03-12T07:34:29Z

All committers have signed the CLA.

gemini-code-assist

Code Review

This pull request introduces support for MXFP8 quantization on Ascend hardware. While the new functionality for Ascend devices appears sound, the implementation has introduced several critical regressions that break existing FP8 quantization for non-Ascend hardware. Key issues include a flawed logic for handling weight_block_size, incorrect scale parameter naming that disregards vLLM version differences, and the disabling of a crucial environment variable needed for FP8 patches. Furthermore, a change forces all users onto a less performant shared memory data transfer method, even when faster IPC is available. These issues must be resolved to maintain backward compatibility and prevent performance degradation for other users.

gemini-code-assist · 2026-03-12T07:37:52Z

verl/utils/vllm/vllm_fp8_utils.py

+    is_mxfp8_npu = is_mxfp8_vllm_ascend(quant_config)
+
+    weight_block_size = None
+    # if quant_config.weight_block_size is None:
+    #     raise ValueError("Currently only support blockwise quantization, please set weight_block_size in quant_config")
+    if hasattr(quant_config, "weight_block_size"):
+        weight_block_size = quant_config.weight_block_size
+    elif is_mxfp8_npu:
+        weight_block_size = MXFP8_BLOCK_QUANT_KWARGS["weight_block_size"]


The logic for determining weight_block_size is flawed. By commenting out the check for quant_config.weight_block_size, the non-NPU FP8 path will raise an AttributeError if quant_config does not have this attribute, as it's accessed directly later in the quantization loop. This breaks existing functionality. The check must be preserved for the non-NPU path.

Suggested change

is_mxfp8_npu = is_mxfp8_vllm_ascend(quant_config)

weight_block_size = None

# if quant_config.weight_block_size is None:

# raise ValueError("Currently only support blockwise quantization, please set weight_block_size in quant_config")

if hasattr(quant_config, "weight_block_size"):

weight_block_size = quant_config.weight_block_size

elif is_mxfp8_npu:

weight_block_size = MXFP8_BLOCK_QUANT_KWARGS["weight_block_size"]

is_mxfp8_npu = is_mxfp8_vllm_ascend(quant_config)

weight_block_size = None

if is_mxfp8_npu:

weight_block_size = MXFP8_BLOCK_QUANT_KWARGS["weight_block_size"]

else:

if not hasattr(quant_config, "weight_block_size") or quant_config.weight_block_size is None:

raise ValueError("Currently only support blockwise quantization, please set weight_block_size in quant_config")

gemini-code-assist · 2026-03-12T07:37:52Z

verl/utils/vllm/vllm_fp8_utils.py

+        # Yield the scale with appropriate naming based on vLLM versio
+        yield (k + "_scale", param_scale)
+        # if is_vllm_11_or_later:
+        #     if "expert" in k:
+
+        #     else:
+        #         yield (k + "_scale", param_scale)
+        # else:
+        #     yield (k + "_scale_inv", param_scale)


The logic for yielding the scale parameter has been simplified to always yield k + "_scale". This removes the previous logic that handled different vLLM versions and yielded k + "_scale_inv" when appropriate. This change breaks FP8 quantization for non-Ascend (NVIDIA) hardware, which may expect _scale_inv. The change should be made conditional to apply only for the new Ascend MXFP8 path.

Suggested change

# Yield the scale with appropriate naming based on vLLM versio

yield (k + "_scale", param_scale)

# if is_vllm_11_or_later:

# if "expert" in k:

# else:

# yield (k + "_scale", param_scale)

# else:

# yield (k + "_scale_inv", param_scale)

# Yield the scale with appropriate naming based on vLLM version

if is_mxfp8_npu:

yield (k + "_scale", param_scale)

elif is_vllm_11_or_later:

if "expert" in k:

yield (k + "_scale_inv", param_scale)

else:

yield (k + "_scale", param_scale)

else:

yield (k + "_scale_inv", param_scale)

gemini-code-assist · 2026-03-12T07:37:52Z

verl/workers/rollout/vllm_rollout/vllm_async_server.py

                apply_vllm_fp8_patches()
                # for subprocesses patching
-                os.environ["VERL_VLLM_FP8_QUANT_ENABLED"] = "1"
+                # os.environ["VERL_VLLM_FP8_QUANT_ENABLED"] = "1"


The line os.environ["VERL_VLLM_FP8_QUANT_ENABLED"] = "1" has been commented out. This environment variable is necessary to enable the vLLM FP8 patches in worker subprocesses. Without it, the FP8 quantization for non-Ascend hardware will not work correctly. This line should be restored to ensure existing FP8 functionality is not broken.

Suggested change

# os.environ["VERL_VLLM_FP8_QUANT_ENABLED"] = "1"

os.environ["VERL_VLLM_FP8_QUANT_ENABLED"] = "1"

verl/workers/rollout/vllm_rollout/vllm_rollout.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

zhijie-os added 2 commits March 6, 2026 10:38

[Ascend] Ascend MXFP8 support for VERL

c22ffda

[Ascend] remove debugging printout

acccf99

zhijie-os requested review from PeterSH6, chenhaiq and wuxibin89 as code owners March 12, 2026 07:34

gemini-code-assist bot reviewed Mar 12, 2026

View reviewed changes

Apply suggestion from @gemini-code-assist[bot]

ba9a285

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rollout]{feat}Ascend 950 hardware mxfp8 rollout quantization #5569

[rollout]{feat}Ascend 950 hardware mxfp8 rollout quantization #5569
zhijie-os wants to merge 3 commits intoverl-project:mainfrom
zhijie-os:A5-MXFP8

zhijie-os commented Mar 12, 2026

Uh oh!

CLAassistant commented Mar 12, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 12, 2026

Uh oh!

gemini-code-assist bot Mar 12, 2026

Uh oh!

gemini-code-assist bot Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	# os.environ["VERL_VLLM_FP8_QUANT_ENABLED"] = "1"
	os.environ["VERL_VLLM_FP8_QUANT_ENABLED"] = "1"

Conversation

zhijie-os commented Mar 12, 2026

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

CLAassistant commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Mar 12, 2026 •

edited

Loading