Skip to content

【Hackathon 10th Spring No.50】Add MiniCPM4.1-8B model support with μP scaling#7506

Open
bobby-cloudforge wants to merge 1 commit intoPaddlePaddle:developfrom
CloudForge-Solutions:task/050-minicpm41-model-1
Open

【Hackathon 10th Spring No.50】Add MiniCPM4.1-8B model support with μP scaling#7506
bobby-cloudforge wants to merge 1 commit intoPaddlePaddle:developfrom
CloudForge-Solutions:task/050-minicpm41-model-1

Conversation

@bobby-cloudforge
Copy link
Copy Markdown

Motivation

Add inference support for MiniCPM4 / MiniCPM4.1-8B (openbmb) to FastDeploy. The MiniCPM4 series features μP (Maximal Update Parametrization) scaling, which requires special handling of embedding scaling, residual connections, and LM head logit computation. This PR implements the full model architecture within FastDeploy's existing framework patterns.

Upstream issue: https://github.com/PaddlePaddle/FastDeploy/issues/74773

Modifications

New Files

  • fastdeploy/model_executor/models/minicpm4.py — Full model implementation:
    • MiniCPM4MLP: Feed-forward with merged gate/up projection
    • MiniCPM4Attention: GQA attention with QKV parallel linear
    • MiniCPM4DecoderLayer: Decoder layer with μP residual scaling (scale_depth / √num_hidden_layers)
    • MiniCPM4Model: Backbone with μP embedding scaling (scale_emb)
    • MiniCPM4ForCausalLM: LM head with μP logit scaling (hidden_size / dim_model_base), weight mapping from HF format
  • tests/model_executor/test_minicpm4.py — 24 unit tests covering all components
  • docs/best_practices/MiniCPM4-8B.md — Best practices guide

Modified Files

  • docs/supported_models.md — Added MiniCPM4 entry

Key Design Decisions

  1. μP scaling at three sites (embedding, residual, logits) — matches HuggingFace reference implementation
  2. GQA attention via QKVParallelLinear — consistent with Qwen2/Qwen3 patterns in FastDeploy
  3. Weight mapping converts HF gate_proj/up_proj → FastDeploy merged up_gate_proj format
  4. Registration as MiniCPMForCausalLM — matches HF architecture string in model config

Usage or Command

python -m fastdeploy.entrypoints.openai.api_server \
       --model openbmb/MiniCPM4.1-8B \
       --tensor-parallel-size 1 \
       --quantization wint4 \
       --max-model-len 32768 \
       --max-num-seqs 128

Accuracy Tests

24/24 unit tests pass (monkeypatch-based, no GPU required):

  • MLP: forward, load_state_dict
  • Attention: forward, load_state_dict
  • DecoderLayer: residual scaling correctness, forward, load
  • Model: embedding scaling, layer iteration, load_state_dict
  • ForCausalLM: μP logit scaling, compute_logits, weight mapping, tensor parallel splits
  • Registration: architecture string lookup

Integration tests validated on A800-80GB (6/6 pass).

Checklist

  • Code follows FastDeploy model patterns (Qwen2, Qwen3, DeepSeek)
  • μP scaling matches HuggingFace MiniCPM4 reference
  • Unit tests use real objects with monkeypatch.setattr (no MagicMock)
  • Pre-commit hooks pass (black, isort, flake8, ruff, pymarkdown)
  • Best practices documentation included

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 20, 2026

Thanks for your contribution!

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-20 15:33 CST

📋 Review 摘要

PR 概述:新增 MiniCPM4/4.1-8B 模型推理支持,实现了 μP (Maximal Update Parametrization) 的三处缩放(embedding、residual、logits)
变更范围model_executor/models/docs/tests/
影响面 TagModels Docs

📝 PR 规范检查

PR 标题缺少官方要求的 Tag 前缀。当前标题为 【Hackathon 10th Spring No.50】Add MiniCPM4.1-8B model support with μP scaling,需要添加 [Models] 标签。

标题建议(可直接复制):

  • [Models]【Hackathon 10th Spring No.50】Add MiniCPM4.1-8B model support with μP scaling

代码审查结果

经过对模型实现文件 minicpm4.py 和参考模型 qwen2.py 的逐项对比验证:

检查项 结果
μP embedding scaling (scale_emb) ✅ 正确,乘以 scale_emb
μP residual scaling (scale_depth / √N) ✅ 正确,attention 和 MLP 输出各乘一次
μP logit scaling (hidden_size / dim_model_base) ✅ 正确,compute_logits 中除以该因子后送入 lm_head,与 HuggingFace 参考一致
组件复用 (layers/ 中的 QKVParallelLinear, RMSNorm 等) ✅ 完全复用框架组件
权重映射 (HF gate_proj/up_proj → FD up_gate_proj) ✅ 正确,与 Qwen2 模式一致
模型注册 (ModelRegistry + auto_models_registry) ✅ 自动扫描注册,无需手动修改 __init__.py
TP 并行切分映射 ✅ Column/Row 切分方向正确
tie_word_embeddings 处理 ✅ 在 load_weights 末尾正确处理
单元测试覆盖 ✅ 24 个测试覆盖所有组件

问题

未发现阻塞性问题。

总体评价

实现质量较高,严格遵循了项目既有的 Qwen2 模型模式,μP 三处缩放逻辑正确,权重加载和 TP 映射完整。建议补充 PR 标题的 [Models] 标签。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants