Conversation
|
Thanks for your contribution! |
78430a6 to
f6698f4
Compare
|
/re-run all-failed |
1 similar comment
|
/re-run all-failed |
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (4.91%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #4231 +/- ##
==========================================
Coverage ? 38.89%
==========================================
Files ? 474
Lines ? 90061
Branches ? 0
==========================================
Hits ? 35029
Misses ? 55032
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
GuoxiaWang
left a comment
There was a problem hiding this comment.
看评论,下一个版本需要修掉现有问题。这个版本先合入。
|
|
||
| optimizer._create_accumulators(paddle.base.framework.default_main_program().global_block(), parameter_list) | ||
| return | ||
|
|
There was a problem hiding this comment.
下一个版本需要改掉 MoE 和 非MoE 的假设
| default="adamw", | ||
| metadata={"help": "The optimizer to use."}, | ||
| ) | ||
| muon_exclude_patterns: Optional[List[str]] = field( |
There was a problem hiding this comment.
下一个版本需要适配正则表达式,而不能仅仅是一个字符串 in 操作,不然在多模态混合模型中很难准确定位参数
|
|
||
| # Step 4: mock Muon._muon_update and Muon._apply_optimize | ||
| # Muon's _muon_update is pure Python (paddle.lerp + paddle.assign), | ||
| # so it bypasses the _C_ops.adamw_ patch above. We need explicit |
There was a problem hiding this comment.
当前paddle合入的版本已经修改了已经适配了 adamw_,后续还需要写一个 muon_ 的 Kernel 才行。这里写的 master weight 的reload 和 offload 做不到逐tensor 的 offload。 现在这种实现是提前把所有的 master weight reload,一旦显存占用大,直接炸了,达不到目的。下个版本得修改。
| config: MiniMaxM2Config | ||
|
|
||
| @classmethod | ||
| def _build_muon_slice_config(cls, model, config) -> dict: |
There was a problem hiding this comment.
这里为什么要写一个默认的函数在这里,现在还没有打磨好,默认的是不是容易出问题?
Before submitting
testsfolder. If there are codecov issues, please add tests cases first.PR types
New features
PR changes
We are adapting Muon Optimizer.
Working...
Description