Skip to content

add muon#4231

Merged
risemeup1 merged 2 commits intoPaddlePaddle:developfrom
xxyux:add_muon
Apr 25, 2026
Merged

add muon#4231
risemeup1 merged 2 commits intoPaddlePaddle:developfrom
xxyux:add_muon

Conversation

@xxyux
Copy link
Copy Markdown
Contributor

@xxyux xxyux commented Apr 7, 2026

Before submitting

  • Lint code. If there are lint issues, please format the code first.
# Install and register `pre-commit` in the project folder
pip install pre-commit && pre-commit install

# Process previous code files separately
pre-commit run --file XXXX.py
  • Add test cases into tests folder. If there are codecov issues, please add tests cases first.

PR types

New features

PR changes

We are adapting Muon Optimizer.
Working...

Description

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Apr 7, 2026

Thanks for your contribution!

@xxyux xxyux force-pushed the add_muon branch 2 times, most recently from 78430a6 to f6698f4 Compare April 7, 2026 15:25
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 24, 2026

CLA assistant check
All committers have signed the CLA.

@xxyux
Copy link
Copy Markdown
Contributor Author

xxyux commented Apr 24, 2026

/re-run all-failed

1 similar comment
@xxyux
Copy link
Copy Markdown
Contributor Author

xxyux commented Apr 25, 2026

/re-run all-failed

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 4.91228% with 271 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@73a1b60). Learn more about missing BASE report.

Files with missing lines Patch % Lines
paddleformers/transformers/minimax_m2/modeling.py 0.00% 136 Missing ⚠️
paddleformers/trainer/utils/offload_optimizer.py 0.00% 70 Missing ⚠️
paddleformers/trainer/trainer_utils.py 4.76% 40 Missing ⚠️
paddleformers/trainer/trainer.py 0.00% 18 Missing ⚠️
paddleformers/trainer/training_args.py 73.33% 4 Missing ⚠️
paddleformers/trainer/utils/reshard/common.py 25.00% 3 Missing ⚠️

❌ Your patch status has failed because the patch coverage (4.91%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #4231   +/-   ##
==========================================
  Coverage           ?   38.89%           
==========================================
  Files              ?      474           
  Lines              ?    90061           
  Branches           ?        0           
==========================================
  Hits               ?    35029           
  Misses             ?    55032           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@risemeup1 risemeup1 merged commit c0b7e21 into PaddlePaddle:develop Apr 25, 2026
26 of 31 checks passed
Copy link
Copy Markdown
Contributor

@GuoxiaWang GuoxiaWang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看评论,下一个版本需要修掉现有问题。这个版本先合入。


optimizer._create_accumulators(paddle.base.framework.default_main_program().global_block(), parameter_list)
return

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

下一个版本需要改掉 MoE 和 非MoE 的假设

default="adamw",
metadata={"help": "The optimizer to use."},
)
muon_exclude_patterns: Optional[List[str]] = field(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

下一个版本需要适配正则表达式,而不能仅仅是一个字符串 in 操作,不然在多模态混合模型中很难准确定位参数


# Step 4: mock Muon._muon_update and Muon._apply_optimize
# Muon's _muon_update is pure Python (paddle.lerp + paddle.assign),
# so it bypasses the _C_ops.adamw_ patch above. We need explicit
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

当前paddle合入的版本已经修改了已经适配了 adamw_,后续还需要写一个 muon_ 的 Kernel 才行。这里写的 master weight 的reload 和 offload 做不到逐tensor 的 offload。 现在这种实现是提前把所有的 master weight reload,一旦显存占用大,直接炸了,达不到目的。下个版本得修改。

config: MiniMaxM2Config

@classmethod
def _build_muon_slice_config(cls, model, config) -> dict:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为什么要写一个默认的函数在这里,现在还没有打磨好,默认的是不是容易出问题?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants