[QUESTION] Muon args mismatch between Emerging Optimizers and Megatron-LM

The [OrthogonalizedOptimizer class](https://github.com/NVIDIA-NeMo/Emerging-Optimizers/blob/main/emerging_optimizers/orthogonalized_optimizers/orthogonalized_optimizer.py) in Emerging Optimizers uses 
```
def __init__(
        ...
        *,
        nesterov: bool,
        ...
    ):
```
but currently the [muon.py](https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/optimizer/muon.py) file uses `use_nesterov` instead of the `nesterov` OrthogonalizedOptimizer expects.

In a similar flavor, I also noticed that the Emerging Optimizers package currently uses `tp_mode` whereas currently MegatronLM uses `mode`.
I understand this integration is on the dev branch and the Emerging-Optimizers API is experimental/subject to change, so this may already be on your radar. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] Muon args mismatch between Emerging Optimizers and Megatron-LM #3870

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QUESTION] Muon args mismatch between Emerging Optimizers and Megatron-LM #3870

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions