Commit 50549b3
feat: add Muon optimizer with distributed sharding support
Add Muon optimizer implementation with Newton-Schulz orthogonalization
for distributed training:
- Muon optimizer (python/paddle/optimizer/muon.py):
- Newton-Schulz iteration for orthogonal gradient updates
- QKV split modes: per_head, qkv_sep, full
- FFN gate_up split support
- Multiple NS coefficient types: simple, quintic, polar_express, aol
- MuonShardingOptimizer:
- Whole-tensor assignment for 2D parameters (Muon)
- Element-wise sharding for non-2D parameters (AdamW)
- Hybrid memory balancing across ranks
- Test coverage:
- All 24 parameter combinations tested
- 2-GPU sharding validation against single-GPU reference
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 87bf071 commit 50549b3
File tree
11 files changed
+767
-1107
lines changed- python/paddle
- distributed/fleet
- base
- meta_optimizers
- dygraph_optimizer
- utils
- optimizer
- test/collective/fleet
11 files changed
+767
-1107
lines changedLines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
337 | 337 | | |
338 | 338 | | |
339 | 339 | | |
| 340 | + | |
| 341 | + | |
340 | 342 | | |
341 | 343 | | |
342 | 344 | | |
| |||
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
| 41 | + | |
Lines changed: 0 additions & 26 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1252 | 1252 | | |
1253 | 1253 | | |
1254 | 1254 | | |
1255 | | - | |
1256 | | - | |
1257 | | - | |
1258 | | - | |
1259 | | - | |
1260 | | - | |
1261 | | - | |
1262 | 1255 | | |
1263 | 1256 | | |
1264 | | - | |
1265 | | - | |
1266 | 1257 | | |
1267 | 1258 | | |
1268 | 1259 | | |
| |||
1280 | 1271 | | |
1281 | 1272 | | |
1282 | 1273 | | |
1283 | | - | |
1284 | | - | |
1285 | | - | |
1286 | | - | |
1287 | | - | |
1288 | | - | |
1289 | | - | |
1290 | | - | |
1291 | | - | |
1292 | | - | |
1293 | | - | |
1294 | 1274 | | |
1295 | 1275 | | |
1296 | | - | |
1297 | | - | |
1298 | | - | |
1299 | | - | |
1300 | | - | |
1301 | | - | |
1302 | 1276 | | |
1303 | 1277 | | |
1304 | 1278 | | |
| |||
Lines changed: 7 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
26 | | - | |
27 | | - | |
| 26 | + | |
| 27 | + | |
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| |||
287 | 287 | | |
288 | 288 | | |
289 | 289 | | |
290 | | - | |
291 | | - | |
292 | | - | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
293 | 293 | | |
294 | 294 | | |
295 | 295 | | |
| |||
340 | 340 | | |
341 | 341 | | |
342 | 342 | | |
343 | | - | |
| 343 | + | |
344 | 344 | | |
345 | 345 | | |
346 | 346 | | |
| |||
637 | 637 | | |
638 | 638 | | |
639 | 639 | | |
640 | | - | |
| 640 | + | |
641 | 641 | | |
642 | 642 | | |
643 | 643 | | |
| |||
Lines changed: 0 additions & 102 deletions
This file was deleted.
0 commit comments