Add MiniCPM-1B model support by Belugaaaa · Pull Request #4601 · PaddlePaddle/PaddleFormers

Belugaaaa · 2026-06-03T11:25:21Z

Before submitting

Lint code. If there are lint issues, please format the code first.

Validation passed locally:

pre-commit run --files ...
bash -n scripts/minicpm/run_ms_swift_gsm8k.sh scripts/minicpm/run_sft_gsm8k.sh tests/integration_test/minicpm_sft_single_card.sh
python3 -m py_compile scripts/minicpm/*.py paddleformers/nn/norm.py paddleformers/transformers/minicpm/*.py tests/transformers/minicpm/*.py
Add test cases into tests folder. If there are codecov issues, please add tests cases first.

Added test coverage:

tests/transformers/minicpm/test_modeling.py
tests/config/ci/minicpm_sft_single.yaml
tests/integration_test/minicpm_sft_single_card.sh

PR types

New features

PR changes

Models, Docs

Description

This PR adds PaddleFormers support for openbmb/MiniCPM-1B-sft-bf16, including:

MiniCPM config and model implementation.
AutoConfig and AutoModelForCausalLM registration.
HF-to-Paddle native checkpoint conversion helper.
Forward/generation alignment helpers.
GSM8K SFT and ms-swift baseline helper scripts.
Inference compiler benchmark helper.
Tiny random checkpoint generation helper for CI/CE.
Unit tests and single-card SFT CI smoke config.
PR-facing acceptance status document.

Local validation uses the current best path: full-depth, full-width MiniCPM-1B. The HF pytorch_model.bin checkpoint is converted to Paddle native model_state.pdparams, then loaded through PaddleFormers for alignment and training validation.

Validation results:

Full MiniCPM-1B FP32 logits alignment:
- max_diff=4.57763671875e-05
- mean_diff=8.224584234994836e-06
- last_max_diff=3.0517578125e-05
- last_mean_diff=7.700375135755166e-06
Generation alignment:
- first_10_tokens_match=True
- HF and Paddle first 10 generated token ids: [11225, 72, 5, 2219, 8107, 1379, 8360, 1410, 11225, 72]
GSM8K 300-step PaddleFormers SFT completed:
- Eval losses: step 50 0.5329, step 100 0.5421, step 150 0.5408, step 200 0.5424, step 250 0.5483, step 300 0.5435
- Final train metrics: train_loss=0.1506, train_steps_per_second=0.151
GSM8K 300-step ms-swift baseline completed:
- Eval losses: step 50 0.58973086, step 100 0.56501245, step 150 0.55724013, step 200 0.55477822, step 250 0.55431318, step 300 0.55419004
- PaddleFormers and ms-swift eval curves converge into the same range; step-300 eval loss delta is about 0.0107
Unit tests passed locally:
- Ran 33 tests ... OK (skipped=5)
Tiny SFT smoke completed locally:
- train_loss=11.2081

Known remaining acceptance items:

CE tiny checkpoint still needs to be uploaded to the official tiny model location used by tests/integration_test/minicpm_sft_single_card.sh.
Local compiler inference benchmark did not reach the 20% speedup target. Logs indicate missing symbolic shape support for fused_rms_norm_ext; this should be rerun in the official compiler/runtime acceptance environment.

Add MiniCPM-1B model support

9a0dc02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MiniCPM-1B model support#4601

Add MiniCPM-1B model support#4601
Belugaaaa wants to merge 1 commit into
PaddlePaddle:developfrom
Belugaaaa:feat/minicpm-1b

Belugaaaa commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Belugaaaa commented Jun 3, 2026

Before submitting

PR types

PR changes

Description

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant