feat[Muon]: migrate slice logic to caller, support fused storage, and generalize color-group management by xxyux · Pull Request #78716 · PaddlePaddle/Paddle

xxyux · 2026-04-18T15:57:43Z

PR Category

Execute Infrastructure

PR Types

Improvements

Description

This PR makes two sets of changes to the Muon distributed optimizer:

Generalize color-group handling in MuonShardingOptimizer

Previously, MuonShardingOptimizer hardcoded two color paths (None for the default sharding group and moe_expert for MoE experts). Adding any new parameter group required modifying the optimizer internals.

Replace _build_color_to_group_info(hcg) (static, hardcoded) with _build_color_to_group_info_from_params(parameter_list, default_group) that dynamically scans param.color dicts at runtime — any new color is picked up automatically without code changes
Generalize step(), init (local_opt_params), reduce_gradients, and _sharding_sync_parameters to iterate _rank2params_2d_by_color instead of separate hardcoded loops
Clean up comments: fix incorrect descriptions, translate Chinese to English, remove dead code and debug prints
2. Move QKV/FFN split logic out of the optimizer; add V100 fp32 matmul fallback

Remove built-in QKV/FFN split logic (QKVInfo, qkv_info, intermediate_size, muon_qkv_update_mode, muon_ffn_split) from the optimizer core; model-specific slice strategies are now passed via MuonParamInfo.slice_func by the caller
Add ns_matmul_dtype parameter to Muon.init and _zeropower_via_newtonschulz5: auto-detects bfloat16 on Ampere+ (capability ≥ 8.0) and falls back to float32 on V100 and older, enabling CI on V100

是否引起精度变化

否

paddle-bot · 2026-04-18T15:57:52Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

codecov-commenter · 2026-04-18T20:31:26Z

Codecov Report

❌ Patch coverage is 93.54839% with 4 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@bae4558). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...d/fleet/meta_optimizers/muon_sharding_optimizer.py	94.00%	3 Missing ⚠️
python/paddle/optimizer/muon.py	90.90%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop   #78716   +/-   ##
==========================================
  Coverage           ?   93.54%           
==========================================
  Files              ?        3           
  Lines              ?       62           
  Branches           ?        0           
==========================================
  Hits               ?       58           
  Misses             ?        4           
  Partials           ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…mpatibility - muon_sharding_optimizer: replace hardcoded None/moe_expert color paths with generic _rank2params_2d_by_color iteration in step() and __init__ Step4; replace static _build_color_to_group_info(hcg) with dynamic _build_color_to_group_info_from_params(parameter_list, default_group) that scans param.color dicts at runtime; generalize reduce_gradients and _sharding_sync_parameters similarly; clean up comments (fix errors, translate Chinese to English, remove dead code and debug prints) - muon: remove built-in QKV/FFN split logic (QKVInfo, qkv_info, intermediate_size, muon_qkv_update_mode, muon_ffn_split) from the optimizer; callers now pass slice strategies via MuonParamInfo.slice_func, keeping model-specific split logic out of the optimizer core; add ns_matmul_dtype parameter to Muon.__init__ and _zeropower_via_newtonschulz5 with auto-detect (bfloat16 on Ampere+, float32 on V100 and older) to enable CI on V100 - optimizer: allow Muon class to skip incompatible base-class checks - test: update hybrid_parallel_sharding_muon_model and test_parallel_dygraph_muon to use current MuonParamInfo API (slice_func instead of deprecated qkv_info/ intermediate_size); remove GPU capability >= 8 skipIf guard so tests run on V100 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

xxyux · 2026-04-19T15:33:53Z

/re-run all-failed

GuoxiaWang requested changes Apr 18, 2026

View reviewed changes

Comment thread python/paddle/optimizer/muon.py Outdated

xxyux force-pushed the xxyux/muon-dev branch from 4c66ce1 to 499baa7 Compare April 19, 2026 03:08

xxyux force-pushed the xxyux/muon-dev branch from 499baa7 to 594848e Compare April 19, 2026 11:28

GuoxiaWang approved these changes Apr 21, 2026

View reviewed changes

GuoxiaWang merged commit 03af577 into PaddlePaddle:develop Apr 21, 2026
102 of 105 checks passed

This was referenced Apr 22, 2026

Cherry-pick muon to release3.4 #78748

Merged

Cherry pick muon to 3.3 #78784

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat[Muon]: migrate slice logic to caller, support fused storage, and generalize color-group management#78716

feat[Muon]: migrate slice logic to caller, support fused storage, and generalize color-group management#78716
GuoxiaWang merged 1 commit intoPaddlePaddle:developfrom
xxyux:xxyux/muon-dev

xxyux commented Apr 18, 2026

Uh oh!

paddle-bot Bot commented Apr 18, 2026

Uh oh!

Uh oh!

codecov-commenter commented Apr 18, 2026 •

edited

Loading

Uh oh!

xxyux commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

xxyux commented Apr 18, 2026

PR Category

PR Types

Description

是否引起精度变化

Uh oh!

paddle-bot Bot commented Apr 18, 2026

Uh oh!

Uh oh!

codecov-commenter commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

xxyux commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Apr 18, 2026 •

edited

Loading