Feature muon #3925

NewBornRustacean · 2025-10-25T05:48:54Z

Add muon optimizer to burn-optim

What's new?:

Adds a new Muon optimizer implementation (crates/burn-optim/src/optim/muon.rs).
Implements Newton–Schulz orthogonalization, momentum integration, optional weight decay, and shape-based learning-rate adjustment (Original and MatchRmsAdamW modes).
Includes comprehensive unit tests.

Notes:

The original implementation uses bfloat16 and so is pytorch implementation.
for now in this pr, the module tested with f32. bfloat16 tests could be added soon.

Test summary

test_adjust_lr_fn_original - Verifies the Original learning-rate adjustment ratios for square, tall, and wide matrices.
test_adjust_lr_fn_match_rms_adamw - Verifies the MatchRmsAdamW learning-rate adjustment ratios for example shapes.
test_1d_tensor_panics- Ensures Newton–Schulz orthogonalization panics for 1D tensors (requires 2D).
test_muon_optimizer_save_load_state - Verifies optimizer state can be saved and loaded for a Linear layer without bias.
test_muon_with_weight_decay - Ensures weight decay is applied (weights are reduced) for a Linear layer without bias.
test_newton_schulz_orthogonalization - Checks Newton–Schulz produces approximately orthogonal output (A * A^T ≈ I).
test_tall_matrix_transpose - Ensures tall matrices are transposed internally and shape is preserved; verifies orthogonalization changes values and wide-matrix behavior.
test_zero_gradient — Confirms Muon handles zero gradients without NaNs, creates state, and weight decay still reduces values when gradients are zero.

Related issue, readings, etc.

nathanielsimard · 2025-11-03T20:11:35Z

crates/burn-optim/src/optim/muon.rs

+/// hidden layers (weight matrices). Other parameters such as biases and embeddings
+/// should be optimized using a standard method such as AdamW.


Are those parameters ignored during training if you use only a single optimizer?

nathanielsimard · 2025-11-03T20:16:32Z

crates/burn-optim/src/optim/muon.rs

+    /// - Original: https://github.com/KellerJordan/Muon/blob/master/muon.py
+    /// - PyTorch: https://github.com/pytorch/pytorch/blob/main/torch/optim/muon.py
+    fn zeropower_via_newtonschulz<const D: usize>(&self, g: Tensor<B, D>) -> Tensor<B, D> {
+        assert!(


Unsure if this should be the default behavior. There isn't a great way yet to define multiple optimizers for a single burn module (e.g a linear layer with a bias vector). Do you have an idea: @laggui ?

NewBornRustacean added 2 commits October 25, 2025 10:31

define muon config

1394db8

skeleton for muon state

0aa682a

NewBornRustacean mentioned this pull request Oct 25, 2025

Add muon optimizer #3924

Open

NewBornRustacean added 11 commits October 25, 2025 15:32

include momentum from momentum::Momentum

496c39c

align config types with other optimizers and add tests

4f94202

draft for newtonschulz

21ca40f

add comprehensive tests and docs

a637eb8

move impl MuonConfig right befind its struct

6b29724

add test tolerance 1e-8

c8a0fd8

add edge cases

cdfbb60

use clamp_min instead of add eps to get a norm

2edeeaf

to align with pytorch implementaion, allow 2d tensors only

4c4fd65

edit typo in assert, test case

653d84a

Merge branch 'main' into feature-muon

348f200

NewBornRustacean marked this pull request as ready for review November 2, 2025 05:15

nathanielsimard reviewed Nov 3, 2025

View reviewed changes

laggui self-requested a review November 3, 2025 20:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature muon #3925

Feature muon #3925

NewBornRustacean commented Oct 25, 2025 •

edited

Loading

Uh oh!

nathanielsimard Nov 3, 2025

Uh oh!

nathanielsimard Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		/// hidden layers (weight matrices). Other parameters such as biases and embeddings
		/// should be optimized using a standard method such as AdamW.

Feature muon #3925

Are you sure you want to change the base?

Feature muon #3925

Conversation

NewBornRustacean commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add muon optimizer to burn-optim

What's new?:

Notes:

Test summary

Related issue, readings, etc.

Uh oh!

nathanielsimard Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

nathanielsimard Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NewBornRustacean commented Oct 25, 2025 •

edited

Loading