[Feature] Support FP4 communication quantization by lizexu123 · Pull Request #7488 · PaddlePaddle/FastDeploy

lizexu123 · 2026-04-19T15:25:38Z

Motivation

支持fp4 通信量化,以hidden_size = 7168为例子

路径	数据	Scale	总计 (per token)
BF16	7168 × 2B = 14336B	无	14336B
FP4	3584 × 1B = 3584B	112 × 4B = 448B	4032B

FP4 通信量 = BF16 的 28%，减少约 3.5 倍。在 EP 场景下（8卡 all-to-all），通信往往是瓶颈

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

…into kc

paddle-bot · 2026-04-19T15:25:45Z

Thanks for your contribution!

CLAassistant · 2026-04-19T15:25:46Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ lizexu123
❌ root

root seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

PaddlePaddle-bot

🤖 AI Code Review | 2026-04-19 23:37 CST\n\n## 📋 Review 摘要\n\nPR 概述：支持 FP4 通信量化，在 EP prefill dispatch 场景下将通信量降低至 BF16 的约 28%，显著减少 all-to-all 通信瓶颈。\n变更范围：custom_ops/gpu_ops/moe/（C++ 算子）、model_executor/layers/quantization/nvfp4.py（Python 量化逻辑）、envs.py（环境变量）、forward_meta.py、utils.py\n影响面 Tag：OP Quantization\n\n### 📝 PR 规范检查\n\nPR 描述中 Modifications、Usage or Command、Accuracy Tests 三个章节均为空，建议补充。该 PR 影响模型前向计算（MoE FFN 路径），应提供精度测试结果。\n\n描述模板（可直接复制）：\n\n## Modifications\n1. 新增环境变量 FD_USE_NVFP4_COMM_QUANT 控制 FP4 通信量化开关\n2. 在 nvfp4.py apply_ep_prefill 中增加 FP4 预量化 dispatch 路径\n3. C++ 算子 PrefillPermuteToMaskedGemm 新增 UINT8 数据类型支持\n4. utils.py 增加未初始化权重的防御性检查\n\n## Usage or Command\nFD_USE_NVFP4_COMM_QUANT=1 python -m fastdeploy.entrypoints.openai.api_server ...\n\n## Accuracy Tests\n（请补充 EP 场景下开启/关闭 FP4 通信量化的精度对比结果）\n\n\n### 问题\n\n| 级别 | 文件 | 概述 |\n|------|------|------|\n| 🟡 建议 | utils.py:136 | 调试日志使用非正式中文，不适合生产代码 |\n| ❓ 疑问 | forward_meta.py:156 | audio_token_num 字段与本 PR 功能无关且未被使用 |\n\n### 总体评价\n\nFP4 通信量化的核心逻辑（预量化 → dispatch → scale swizzle → masked GEMM）实现合理，注释清晰，FP4/BF16 两条路径分支清楚。建议补充 PR 描述中的精度测试结果，并清理两处小问题。"

PaddlePaddle-bot · 2026-04-19T15:37:25Z


 def process_weight_transpose(layer, weight_name):
    weight = getattr(layer, weight_name)
+    if not weight._is_initialized():


🟡 建议 调试日志使用了非正式中文 "权重没初始化啊！"，不适合生产代码。

日志语言应与项目其他日志保持一致（英文），且语气应正式；

日志级别建议使用 logger.warning 而非 logger.info——权重未初始化导致跳过 transpose 可能掩盖上游问题，warning 级别更有助于排查。

建议修改为：

if not weight._is_initialized(): logger.warning("Weight '%s' is not initialized, skipping transpose.", weight_name) return

PaddlePaddle-bot · 2026-04-19T15:37:25Z

    moe_num_chunk: int = 1
    max_moe_num_chunk: int = 1

+    audio_token_num: int = 0


❓ 疑问 audio_token_num 字段与本 PR 的 FP4 通信量化功能似乎无关，且搜索整个代码库后未发现任何读取或写入此字段的代码。

请确认：这个字段是否应该在单独的 PR 中提交？如果是后续功能的前置准备，建议在注释中说明用途。

root and others added 3 commits April 18, 2026 21:18

update

fc26012

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

f8e98b6

…into kc

support NVFP4_comm_quant

cd293b0

lizexu123 had a problem deploying to Metax_ci April 19, 2026 15:25 — with GitHub Actions Failure

PaddlePaddle-bot reviewed Apr 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support FP4 communication quantization#7488

[Feature] Support FP4 communication quantization#7488
lizexu123 wants to merge 3 commits intoPaddlePaddle:developfrom
lizexu123:kc

lizexu123 commented Apr 19, 2026 •

edited

Loading

Uh oh!

paddle-bot bot commented Apr 19, 2026

Uh oh!

CLAassistant commented Apr 19, 2026

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot Apr 19, 2026

Uh oh!

PaddlePaddle-bot Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lizexu123 commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Apr 19, 2026

Uh oh!

CLAassistant commented Apr 19, 2026

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lizexu123 commented Apr 19, 2026 •

edited

Loading