-
-
Couldn't load subscription status.
- Fork 25
Description
Motivation and description
The LiMuon: Light and Fast Muon Optimizer for Large Models is a resource effective variant of the Muon optimizer. It's getting a lot of attention today due to it's application to training LLM's. I think it would be a nice addition to this package.
The reference can be found here: https://arxiv.org/pdf/2509.14562
In the paper the show that the optimizer outperforms AdamW both in Training error, Test error and convergence speed.
From their abstract:
Large models recently are widely applied in artificial intelligence, so efficient training of
large models has received widespread attention. More recently, a useful Muon optimizer is
specifically designed for matrix-structured parameters of large models. Although some works
have begun to studying Muon optimizer, the existing Muon and its variants still suffer from high
sample complexity or high memory for large models. To fill this gap, we propose a light and
fast Muon (LiMuon) optimizer for training large models, which builds on the momentum-based
variance reduced technique and randomized Singular Value Decomposition (SVD)
Possible Implementation
No response