Skip to content

Conversation

@SlightwindSec
Copy link

What does this PR do?

This PR aims to support MXFP8 (Microscaling Formats) rollout on Ascend NPU hardware.
Note: This is a Draft PR for early feedback and collaboration. The core online MXFP8 quantization logic is currently under development (see TODOs in the code).

Checklist Before Starting

  • Search for similar PRs.
  • Format the PR title as [{modules}] {type}: {description}

Test

Since this is a Draft PR and core quantization is pending, full end-to-end results (training curves) are not yet available.

  • Functional validation on Ascend NPU (Planned)
  • Performance benchmark comparison between BF16/MXFP8 (Planned)

API and Usage Example

  • actor_rollout_ref.rollout.quantization=mxfp8

Design & Code Changes

The high-level design focuses on integrating MXFP8 scaling logic into the rollout worker.
Specific Changes:

  • Modified verl/workers/rollout to recognize mxfp8 as a valid precision type.
  • Added hardware-specific checks for Ascend NPU in the rollout initialization.
  • [TODO] Implement core online MXFP8 quantization kernels/wrappers.

[TODO] Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

@CLAassistant
Copy link

CLAassistant commented Dec 26, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for MXFP8 quantization on Ascend NPUs. The changes correctly add 'mxfp8' as a supported quantization type and include logic to handle its configuration. My review focuses on ensuring the correctness and maintainability of the new code paths.

I've identified a few critical and high-severity issues:

  • There's a potentially incorrect use of fp8-specific patches for the new mxfp8 implementation path in two different files, which could lead to incorrect behavior.
  • I found duplicated logic for handling quantization setup across two files, which will make future maintenance difficult.
  • There's an instance of a hardcoded value that should be replaced with a defined constant to improve maintainability.

My suggestions aim to fix these issues by removing the risky patch calls, using constants, and highlighting the need to refactor duplicated code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants