【Hackathon 10th Spring No.50】Add MiniCPM4.1-8B model support with μP scaling#7506
Open
bobby-cloudforge wants to merge 1 commit intoPaddlePaddle:developfrom
Open
Conversation
|
Thanks for your contribution! |
4 tasks
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-20 15:33 CST
📋 Review 摘要
PR 概述:新增 MiniCPM4/4.1-8B 模型推理支持,实现了 μP (Maximal Update Parametrization) 的三处缩放(embedding、residual、logits)
变更范围:model_executor/models/、docs/、tests/
影响面 Tag:Models Docs
📝 PR 规范检查
PR 标题缺少官方要求的 Tag 前缀。当前标题为 【Hackathon 10th Spring No.50】Add MiniCPM4.1-8B model support with μP scaling,需要添加 [Models] 标签。
标题建议(可直接复制):
[Models]【Hackathon 10th Spring No.50】Add MiniCPM4.1-8B model support with μP scaling
代码审查结果
经过对模型实现文件 minicpm4.py 和参考模型 qwen2.py 的逐项对比验证:
| 检查项 | 结果 |
|---|---|
μP embedding scaling (scale_emb) |
✅ 正确,乘以 scale_emb |
μP residual scaling (scale_depth / √N) |
✅ 正确,attention 和 MLP 输出各乘一次 |
μP logit scaling (hidden_size / dim_model_base) |
✅ 正确,compute_logits 中除以该因子后送入 lm_head,与 HuggingFace 参考一致 |
组件复用 (layers/ 中的 QKVParallelLinear, RMSNorm 等) |
✅ 完全复用框架组件 |
权重映射 (HF gate_proj/up_proj → FD up_gate_proj) |
✅ 正确,与 Qwen2 模式一致 |
模型注册 (ModelRegistry + auto_models_registry) |
✅ 自动扫描注册,无需手动修改 __init__.py |
| TP 并行切分映射 | ✅ Column/Row 切分方向正确 |
tie_word_embeddings 处理 |
✅ 在 load_weights 末尾正确处理 |
| 单元测试覆盖 | ✅ 24 个测试覆盖所有组件 |
问题
未发现阻塞性问题。
总体评价
实现质量较高,严格遵循了项目既有的 Qwen2 模型模式,μP 三处缩放逻辑正确,权重加载和 TP 映射完整。建议补充 PR 标题的 [Models] 标签。
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Add inference support for MiniCPM4 / MiniCPM4.1-8B (openbmb) to FastDeploy. The MiniCPM4 series features μP (Maximal Update Parametrization) scaling, which requires special handling of embedding scaling, residual connections, and LM head logit computation. This PR implements the full model architecture within FastDeploy's existing framework patterns.
Upstream issue: https://github.com/PaddlePaddle/FastDeploy/issues/74773
Modifications
New Files
fastdeploy/model_executor/models/minicpm4.py— Full model implementation:MiniCPM4MLP: Feed-forward with merged gate/up projectionMiniCPM4Attention: GQA attention with QKV parallel linearMiniCPM4DecoderLayer: Decoder layer with μP residual scaling (scale_depth / √num_hidden_layers)MiniCPM4Model: Backbone with μP embedding scaling (scale_emb)MiniCPM4ForCausalLM: LM head with μP logit scaling (hidden_size / dim_model_base), weight mapping from HF formattests/model_executor/test_minicpm4.py— 24 unit tests covering all componentsdocs/best_practices/MiniCPM4-8B.md— Best practices guideModified Files
docs/supported_models.md— Added MiniCPM4 entryKey Design Decisions
QKVParallelLinear— consistent with Qwen2/Qwen3 patterns in FastDeploygate_proj/up_proj→ FastDeploy mergedup_gate_projformatMiniCPMForCausalLM— matches HF architecture string in model configUsage or Command
python -m fastdeploy.entrypoints.openai.api_server \ --model openbmb/MiniCPM4.1-8B \ --tensor-parallel-size 1 \ --quantization wint4 \ --max-model-len 32768 \ --max-num-seqs 128Accuracy Tests
24/24 unit tests pass (monkeypatch-based, no GPU required):
Integration tests validated on A800-80GB (6/6 pass).
Checklist
monkeypatch.setattr(no MagicMock)