【Hackathon 10th Spring No.50】Add MiniCPM4.1-8B model support with μP scaling by bobby-cloudforge · Pull Request #7506 · PaddlePaddle/FastDeploy

bobby-cloudforge · 2026-04-20T06:23:20Z

Motivation

Add inference support for MiniCPM4 / MiniCPM4.1-8B (openbmb) to FastDeploy. The MiniCPM4 series features μP (Maximal Update Parametrization) scaling, which requires special handling of embedding scaling, residual connections, and LM head logit computation. This PR implements the full model architecture within FastDeploy's existing framework patterns.

Upstream issue: https://github.com/PaddlePaddle/FastDeploy/issues/74773

Modifications

New Files

fastdeploy/model_executor/models/minicpm4.py — Full model implementation:
- MiniCPM4MLP: Feed-forward with merged gate/up projection
- MiniCPM4Attention: GQA attention with QKV parallel linear
- MiniCPM4DecoderLayer: Decoder layer with μP residual scaling (scale_depth / √num_hidden_layers)
- MiniCPM4Model: Backbone with μP embedding scaling (scale_emb)
- MiniCPM4ForCausalLM: LM head with μP logit scaling (hidden_size / dim_model_base), weight mapping from HF format
tests/model_executor/test_minicpm4.py — 24 unit tests covering all components
docs/best_practices/MiniCPM4-8B.md — Best practices guide

Modified Files

docs/supported_models.md — Added MiniCPM4 entry

Key Design Decisions

μP scaling at three sites (embedding, residual, logits) — matches HuggingFace reference implementation
GQA attention via QKVParallelLinear — consistent with Qwen2/Qwen3 patterns in FastDeploy
Weight mapping converts HF gate_proj/up_proj → FastDeploy merged up_gate_proj format
Registration as MiniCPMForCausalLM — matches HF architecture string in model config

Usage or Command

python -m fastdeploy.entrypoints.openai.api_server \
       --model openbmb/MiniCPM4.1-8B \
       --tensor-parallel-size 1 \
       --quantization wint4 \
       --max-model-len 32768 \
       --max-num-seqs 128

Accuracy Tests

24/24 unit tests pass (monkeypatch-based, no GPU required):

MLP: forward, load_state_dict
Attention: forward, load_state_dict
DecoderLayer: residual scaling correctness, forward, load
Model: embedding scaling, layer iteration, load_state_dict
ForCausalLM: μP logit scaling, compute_logits, weight mapping, tensor parallel splits
Registration: architecture string lookup

Integration tests validated on A800-80GB (6/6 pass).

Checklist

Code follows FastDeploy model patterns (Qwen2, Qwen3, DeepSeek)
μP scaling matches HuggingFace MiniCPM4 reference
Unit tests use real objects with monkeypatch.setattr (no MagicMock)
Pre-commit hooks pass (black, isort, flake8, ruff, pymarkdown)
Best practices documentation included

… with μP scaling

paddle-bot · 2026-04-20T06:23:26Z

Thanks for your contribution!

PaddlePaddle-bot

🤖 AI Code Review | 2026-04-20 15:33 CST

📋 Review 摘要

PR 概述：新增 MiniCPM4/4.1-8B 模型推理支持，实现了 μP (Maximal Update Parametrization) 的三处缩放（embedding、residual、logits）
变更范围：model_executor/models/、docs/、tests/
影响面 Tag：Models Docs

📝 PR 规范检查

PR 标题缺少官方要求的 Tag 前缀。当前标题为 【Hackathon 10th Spring No.50】Add MiniCPM4.1-8B model support with μP scaling，需要添加 [Models] 标签。

标题建议（可直接复制）：

[Models]【Hackathon 10th Spring No.50】Add MiniCPM4.1-8B model support with μP scaling

代码审查结果

经过对模型实现文件 minicpm4.py 和参考模型 qwen2.py 的逐项对比验证：

检查项	结果
μP embedding scaling (`scale_emb`)	✅ 正确，乘以 `scale_emb`
μP residual scaling (`scale_depth / √N`)	✅ 正确，attention 和 MLP 输出各乘一次
μP logit scaling (`hidden_size / dim_model_base`)	✅ 正确，`compute_logits` 中除以该因子后送入 lm_head，与 HuggingFace 参考一致
组件复用 (`layers/` 中的 QKVParallelLinear, RMSNorm 等)	✅ 完全复用框架组件
权重映射 (HF `gate_proj`/`up_proj` → FD `up_gate_proj`)	✅ 正确，与 Qwen2 模式一致
模型注册 (`ModelRegistry` + `auto_models_registry`)	✅ 自动扫描注册，无需手动修改 `__init__.py`
TP 并行切分映射	✅ Column/Row 切分方向正确
`tie_word_embeddings` 处理	✅ 在 `load_weights` 末尾正确处理
单元测试覆盖	✅ 24 个测试覆盖所有组件

问题

未发现阻塞性问题。

总体评价

实现质量较高，严格遵循了项目既有的 Qwen2 模型模式，μP 三处缩放逻辑正确，权重加载和 TP 映射完整。建议补充 PR 标题的 [Models] 标签。

[Feature]【Hackathon 10th Spring No.50】Add MiniCPM4.1-8B model support…

0179e4d

… with μP scaling

bobby-cloudforge had a problem deploying to Metax_ci April 20, 2026 06:23 — with GitHub Actions Error

paddle-bot bot added the contributor External developers label Apr 20, 2026

bobby-cloudforge mentioned this pull request Apr 20, 2026

【Hackathon 10th Spring No.50】MiniCPM4.1-8B 设计文档 for FastDeploy PaddlePaddle/community#1337

Open

4 tasks

PaddlePaddle-bot reviewed Apr 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【Hackathon 10th Spring No.50】Add MiniCPM4.1-8B model support with μP scaling#7506

【Hackathon 10th Spring No.50】Add MiniCPM4.1-8B model support with μP scaling#7506
bobby-cloudforge wants to merge 1 commit intoPaddlePaddle:developfrom
CloudForge-Solutions:task/050-minicpm41-model-1

bobby-cloudforge commented Apr 20, 2026

Uh oh!

paddle-bot bot commented Apr 20, 2026

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bobby-cloudforge commented Apr 20, 2026

Motivation

Modifications

New Files

Modified Files

Key Design Decisions

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Apr 20, 2026

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

代码审查结果

问题

总体评价

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants