Skip to content

[QUESTION] Muon args mismatch between Emerging Optimizers and Megatron-LM #3870

@ShiftyBlock

Description

@ShiftyBlock

The OrthogonalizedOptimizer class in Emerging Optimizers uses

def __init__(
        ...
        *,
        nesterov: bool,
        ...
    ):

but currently the muon.py file uses use_nesterov instead of the nesterov OrthogonalizedOptimizer expects.

In a similar flavor, I also noticed that the Emerging Optimizers package currently uses tp_mode whereas currently MegatronLM uses mode.
I understand this integration is on the dev branch and the Emerging-Optimizers API is experimental/subject to change, so this may already be on your radar.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions