[FDConfig] Unify num_experts_per_tok to moe_k in ModelConfig for MoE model compatibility#7509
Open
xyxinyang wants to merge 1 commit intoPaddlePaddle:developfrom
Open
[FDConfig] Unify num_experts_per_tok to moe_k in ModelConfig for MoE model compatibility#7509xyxinyang wants to merge 1 commit intoPaddlePaddle:developfrom
xyxinyang wants to merge 1 commit intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-20 14:57 CST
📋 Review 摘要
PR 概述:在 ModelConfig.override_name_from_config() 中新增 num_experts_per_tok → moe_k 的字段映射,以支持 Qwen3VLMOE、DeepSeek V3 等使用 num_experts_per_tok 的 MoE 模型。
变更范围:fastdeploy/config.py — Config 字段映射
影响面 Tag:FDConfig
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | 多个模型文件 | 统一映射不完整,下游消费方仍直接使用 num_experts_per_tok |
🟡 建议:统一映射后,下游消费方未同步切换到 moe_k
本次 PR 在 override_name_from_config() 中将 num_experts_per_tok 映射为 moe_k,映射逻辑本身是正确的。但代码库中仍有多处直接使用 fd_config.model_config.num_experts_per_tok 而非统一后的 moe_k:
| 文件 | 行号 | 当前用法 |
|---|---|---|
model_executor/models/qwen3moe.py |
69 | top_k=fd_config.model_config.num_experts_per_tok |
model_executor/models/deepseek_v3.py |
167 | top_k=fd_config.model_config.num_experts_per_tok |
model_executor/models/gpt_oss.py |
120 | top_k=fd_config.model_config.num_experts_per_tok |
model_executor/models/glm4_moe.py |
176 | top_k=fd_config.model_config.num_experts_per_tok |
model_executor/layers/moe/routing_indices_cache.py |
170 | 对 Glm4MoeForCausalLM 特殊分支使用 num_experts_per_tok |
建议:
- 将上述文件中的
fd_config.model_config.num_experts_per_tok统一替换为fd_config.model_config.moe_k,使 "Unify" 目标真正达成。 routing_indices_cache.py:169-172中对Glm4MoeForCausalLM的硬编码特殊分支在统一映射后可以移除,直接使用moe_k,简化逻辑:
# 当前(可简化):
if fd_config.model_config.architectures[0] == "Glm4MoeForCausalLM":
self.moe_top_k = fd_config.model_config.num_experts_per_tok
else:
self.moe_top_k = fd_config.model_config.moe_k
# 建议改为:
self.moe_top_k = fd_config.model_config.moe_k总体评价
映射逻辑正确,不会引入 bug。但作为一个标题为 "Unify" 的 PR,建议一并更新下游消费方以完成完整的统一,否则后续维护者可能不清楚应该使用 moe_k 还是 num_experts_per_tok。
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Unify num_experts_per_tok to moe_k in ModelConfig for MoE model compatibility. This enables R3 support for models like Qwen3VLMOE, DeepSeek V3, and other MoE models that use num_experts_per_tok instead of moe_k in their config.
Modifications
Usage or Command
N/A
Accuracy Tests
N/A
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.