Skip to content

Add MiniCPM-1B model support#4601

Open
Belugaaaa wants to merge 1 commit into
PaddlePaddle:developfrom
Belugaaaa:feat/minicpm-1b
Open

Add MiniCPM-1B model support#4601
Belugaaaa wants to merge 1 commit into
PaddlePaddle:developfrom
Belugaaaa:feat/minicpm-1b

Conversation

@Belugaaaa

Copy link
Copy Markdown

Before submitting

  • Lint code. If there are lint issues, please format the code first.

Validation passed locally:

  • pre-commit run --files ...

  • bash -n scripts/minicpm/run_ms_swift_gsm8k.sh scripts/minicpm/run_sft_gsm8k.sh tests/integration_test/minicpm_sft_single_card.sh

  • python3 -m py_compile scripts/minicpm/*.py paddleformers/nn/norm.py paddleformers/transformers/minicpm/*.py tests/transformers/minicpm/*.py

  • Add test cases into tests folder. If there are codecov issues, please add tests cases first.

Added test coverage:

  • tests/transformers/minicpm/test_modeling.py
  • tests/config/ci/minicpm_sft_single.yaml
  • tests/integration_test/minicpm_sft_single_card.sh

PR types

New features

PR changes

Models, Docs

Description

This PR adds PaddleFormers support for openbmb/MiniCPM-1B-sft-bf16, including:

  • MiniCPM config and model implementation.
  • AutoConfig and AutoModelForCausalLM registration.
  • HF-to-Paddle native checkpoint conversion helper.
  • Forward/generation alignment helpers.
  • GSM8K SFT and ms-swift baseline helper scripts.
  • Inference compiler benchmark helper.
  • Tiny random checkpoint generation helper for CI/CE.
  • Unit tests and single-card SFT CI smoke config.
  • PR-facing acceptance status document.

Local validation uses the current best path: full-depth, full-width MiniCPM-1B. The HF pytorch_model.bin checkpoint is converted to Paddle native model_state.pdparams, then loaded through PaddleFormers for alignment and training validation.

Validation results:

  • Full MiniCPM-1B FP32 logits alignment:
    • max_diff=4.57763671875e-05
    • mean_diff=8.224584234994836e-06
    • last_max_diff=3.0517578125e-05
    • last_mean_diff=7.700375135755166e-06
  • Generation alignment:
    • first_10_tokens_match=True
    • HF and Paddle first 10 generated token ids: [11225, 72, 5, 2219, 8107, 1379, 8360, 1410, 11225, 72]
  • GSM8K 300-step PaddleFormers SFT completed:
    • Eval losses: step 50 0.5329, step 100 0.5421, step 150 0.5408, step 200 0.5424, step 250 0.5483, step 300 0.5435
    • Final train metrics: train_loss=0.1506, train_steps_per_second=0.151
  • GSM8K 300-step ms-swift baseline completed:
    • Eval losses: step 50 0.58973086, step 100 0.56501245, step 150 0.55724013, step 200 0.55477822, step 250 0.55431318, step 300 0.55419004
    • PaddleFormers and ms-swift eval curves converge into the same range; step-300 eval loss delta is about 0.0107
  • Unit tests passed locally:
    • Ran 33 tests ... OK (skipped=5)
  • Tiny SFT smoke completed locally:
    • train_loss=11.2081

Known remaining acceptance items:

  • CE tiny checkpoint still needs to be uploaded to the official tiny model location used by tests/integration_test/minicpm_sft_single_card.sh.
  • Local compiler inference benchmark did not reach the 20% speedup target. Logs indicate missing symbolic shape support for fused_rms_norm_ext; this should be rerun in the official compiler/runtime acceptance environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant