Skip to content

Implement LiMuon Optimizer #213

@DoktorMike

Description

@DoktorMike

Motivation and description

The LiMuon: Light and Fast Muon Optimizer for Large Models is a resource effective variant of the Muon optimizer. It's getting a lot of attention today due to it's application to training LLM's. I think it would be a nice addition to this package.

The reference can be found here: https://arxiv.org/pdf/2509.14562

In the paper the show that the optimizer outperforms AdamW both in Training error, Test error and convergence speed.

From their abstract:

Large models recently are widely applied in artificial intelligence, so efficient training of
large models has received widespread attention. More recently, a useful Muon optimizer is
specifically designed for matrix-structured parameters of large models. Although some works
have begun to studying Muon optimizer, the existing Muon and its variants still suffer from high
sample complexity or high memory for large models. To fill this gap, we propose a light and
fast Muon (LiMuon) optimizer for training large models, which builds on the momentum-based
variance reduced technique and randomized Singular Value Decomposition (SVD)

Possible Implementation

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions