The OrthogonalizedOptimizer class in Emerging Optimizers uses
def __init__(
...
*,
nesterov: bool,
...
):
but currently the muon.py file uses use_nesterov instead of the nesterov OrthogonalizedOptimizer expects.
In a similar flavor, I also noticed that the Emerging Optimizers package currently uses tp_mode whereas currently MegatronLM uses mode.
I understand this integration is on the dev branch and the Emerging-Optimizers API is experimental/subject to change, so this may already be on your radar.