Add Optimization Cookbook by samanklesaria · Pull Request #5117 · google/flax

samanklesaria · 2025-11-28T20:02:47Z

What does this PR do?

This PR adds a guide that shows some common techniques for working with Flax models during optimization. These include:

Calculation of Exponential Moving Averages
Optimizing only a low rank addition to certain weights (LORA)
Using different learning rates for different parameters to implement the maximal update parameterization
Using second order optimizers like LBFGS.
Specifying sharding for optimization state that differs from that of parameter state
Gradient accumulation

This document emphasizes a style as close to pure jax as possible: to that end, it shows how the flax version of each technique only requires minor deviation from the often more intuitive pure-jax version.

Warnings:

This PR requires the fix Add _graph_node_set_key method for List class #5171 , which is why the diff looks so big here. This PR shouldn't be merged until after Add _graph_node_set_key method for List class #5171
This PR includes both the notebook file used to construct the optimization examples as well as the RST file that contains what gets used in the documentation. When review is complete (and I don't need to actually test any of the python snippets anymore) I will delete the notebook file.

review-notebook-app · 2025-11-28T20:02:51Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

cgarciae · 2026-01-30T21:11:51Z

I feel we could simplify the intro by doing the following:

Define a single model at the begining, simply reuse it on all examples (its just a guide).
Inline the loss function.

model = nnx.Sequential(
  nnx.Linear(2,8, rngs=rngs),
  nnx.relu,
  nnx.Linear(8,8, rngs=rngs),
)

optimizer = nnx.Optimizer(
  model,
  tx=optax.adam(1e-3),
  wrt=nnx.Param)

...

@nnx.jit
def train_step(model, optimizer, ema, x, y):
  loss_fn = lambda m, x, y: jnp.sum((m(x) - y) ** 2)
  loss, grads = nnx.value_and_grad(loss_fn)(model, x, y)
  optimizer.update(model, grads)
  ema.update(model)
  return loss

docs_nnx/guides/opt_cookbook.rst

cgarciae · 2026-01-30T22:27:50Z

docs_nnx/guides/opt_cookbook.rst

+    model = nnx_model(rngs)
+    state = nnx.state(model, nnx.Param)
+    rates = {'kernel': optax.adam(1e-3), 'bias': optax.adam(1e-2)}
+    param_tys = nnx.map_state(lambda p, v: list(p)[-1], state)


I thin this is enough:

Suggested change

param_tys = nnx.map_state(lambda p, v: list(p)[-1], state)

param_tys = nnx.map_state(lambda p, v: p[-1], state)

We could also use jax.tree. map_with_path as in the JAX example.

cgarciae · 2026-01-30T23:08:04Z

docs_nnx/guides/opt_cookbook.rst

+                         axis_types=(AxisType.Explicit, AxisType.Explicit))
+    jax.set_mesh(mesh)
+
+    ghost_model = jax.eval_shape(lambda: nnx_model(nnx.Rngs(0), out_sharding=P('x', 'y')))


instead of creating this fake model it would be a good opportunity to create the optimizer_sharding API on Variable before finishing this guide.

flax/nnx/helpers.py

cgarciae · 2026-01-30T23:15:20Z

After fully reading the guide I'm getting the sense that having the JAX versions makes explanations a bit longer and slightly harder to understand (cause you have to mentally filter for the version you are interested in) and having the JAX version doesn't necessarily make understanding the NNX version easier.

samanklesaria · 2026-02-02T21:47:39Z

After fully reading the guide I'm getting the sense that having the JAX versions makes explanations a bit longer and slightly harder to understand (cause you have to mentally filter for the version you are interested in) and having the JAX version doesn't necessarily make understanding the NNX version easier.

Fair enough! I'll convert it to nnx-only.

samanklesaria force-pushed the opt_cookbook branch 3 times, most recently from c495dc1 to b929529 Compare December 1, 2025 23:53

samanklesaria force-pushed the opt_cookbook branch 5 times, most recently from 34d7c20 to 444c6b6 Compare December 9, 2025 22:27

samanklesaria force-pushed the opt_cookbook branch from fa523a9 to 5c12190 Compare January 6, 2026 19:22

samanklesaria marked this pull request as ready for review January 6, 2026 20:37

samanklesaria force-pushed the opt_cookbook branch from 3172fdb to b939636 Compare January 12, 2026 16:41

samanklesaria force-pushed the opt_cookbook branch 2 times, most recently from f894a0d to b37c527 Compare January 20, 2026 19:56

cgarciae reviewed Jan 30, 2026

View reviewed changes

docs_nnx/guides/opt_cookbook.rst Outdated Show resolved Hide resolved

cgarciae reviewed Jan 30, 2026

View reviewed changes

docs_nnx/guides/opt_cookbook.rst Outdated Show resolved Hide resolved

cgarciae reviewed Jan 30, 2026

View reviewed changes

docs_nnx/guides/opt_cookbook.rst Outdated Show resolved Hide resolved

cgarciae reviewed Jan 30, 2026

View reviewed changes

flax/nnx/helpers.py Show resolved Hide resolved

samanklesaria force-pushed the opt_cookbook branch from 694bd84 to 76f8752 Compare February 3, 2026 04:31

samanklesaria added 4 commits February 3, 2026 09:57

Fix Sequential unwrapping for static attributes

06cca77

Test for sequential leaves

df0cbde

Add test for sequential map

213c7ae

Add optimization cookbook

f73edbd

samanklesaria force-pushed the opt_cookbook branch from 76f8752 to f73edbd Compare February 3, 2026 16:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Optimization Cookbook#5117

Add Optimization Cookbook#5117
samanklesaria wants to merge 4 commits intogoogle:mainfrom
samanklesaria:opt_cookbook

samanklesaria commented Nov 28, 2025 •

edited

Loading

Uh oh!

review-notebook-app bot commented Nov 28, 2025

Uh oh!

cgarciae commented Jan 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cgarciae Jan 30, 2026

Uh oh!

cgarciae Jan 30, 2026

Uh oh!

cgarciae Jan 30, 2026

Uh oh!

Uh oh!

cgarciae commented Jan 30, 2026

Uh oh!

samanklesaria commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	param_tys = nnx.map_state(lambda p, v: list(p)[-1], state)
	param_tys = nnx.map_state(lambda p, v: p[-1], state)

Conversation

samanklesaria commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Warnings:

Uh oh!

review-notebook-app bot commented Nov 28, 2025

Uh oh!

cgarciae commented Jan 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cgarciae Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

cgarciae Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

cgarciae Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cgarciae commented Jan 30, 2026

Uh oh!

samanklesaria commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

samanklesaria commented Nov 28, 2025 •

edited

Loading