Commit c20b962
feat: add Muon optimizer with distributed sharding support
Add Muon optimizer implementation with Newton-Schulz orthogonalization
for distributed training:
- Muon optimizer (python/paddle/optimizer/muon.py):
- Newton-Schulz iteration for orthogonal gradient updates
- QKV split modes: per_head, qkv_sep, full
- FFN gate_up split support
- Multiple NS coefficient types: simple, quintic, polar_express, aol
- MuonShardingOptimizer:
- Whole-tensor assignment for 2D parameters (Muon)
- Element-wise sharding for non-2D parameters (AdamW)
- Hybrid memory balancing across ranks
- Test coverage:
- All 24 parameter combinations tested
- 2-GPU sharding validation against single-GPU reference
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 0b2356c commit c20b962
File tree
10 files changed
+2427
-7
lines changed- ci
- python/paddle
- distributed/fleet
- base
- meta_optimizers
- dygraph_optimizer
- optimizer
- test/collective/fleet
10 files changed
+2427
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
165 | 165 | | |
166 | 166 | | |
167 | 167 | | |
168 | | - | |
| 168 | + | |
| 169 | + | |
169 | 170 | | |
170 | 171 | | |
171 | 172 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
337 | 337 | | |
338 | 338 | | |
339 | 339 | | |
| 340 | + | |
| 341 | + | |
340 | 342 | | |
341 | 343 | | |
342 | 344 | | |
| |||
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
| 34 | + | |
34 | 35 | | |
35 | 36 | | |
36 | 37 | | |
| |||
Lines changed: 16 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
26 | 29 | | |
27 | 30 | | |
28 | 31 | | |
| |||
284 | 287 | | |
285 | 288 | | |
286 | 289 | | |
287 | | - | |
288 | | - | |
289 | | - | |
290 | | - | |
291 | | - | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
292 | 297 | | |
293 | 298 | | |
294 | 299 | | |
| |||
335 | 340 | | |
336 | 341 | | |
337 | 342 | | |
| 343 | + | |
338 | 344 | | |
339 | 345 | | |
340 | 346 | | |
| |||
628 | 634 | | |
629 | 635 | | |
630 | 636 | | |
631 | | - | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
632 | 642 | | |
633 | 643 | | |
634 | 644 | | |
| |||
0 commit comments