Feat : Add DP-SGD Transformer example using Flax NNX API | Issue #120 #126

debanganghosh08 · 2026-01-24T13:50:44Z

This PR introduces a comprehensive example of training a Transformer model with Differential Privacy using the new Flax NNX API. While JAX Privacy provides robust support for Linen and Haiku, this addition provides a template for users moving toward the functional-object paradigm of NNX.

Key Technical Implementations:

✔️ Exhaustive State Partitioning: Utilizes nnx.split(model, nnx.Param, ...) to strictly separate trainable parameters from non-trainable state (RNG counts, etc.), ensuring the JAX tracer maintains leaf parity across functional boundaries.

✔️ Rank-Normalized Loss: Implements a rank-injection strategy within the pure loss function to account for vmap dimension-stripping. By forcing a singleton batch dimension during the forward pass, the model correctly generates 4D causal masks required by the attention mechanism.

✔️ Privacy-Safe State Reconstruction: Uses an internal nnx.merge pattern to ensure that mutations to RNG states during training remain local to the functional trace, preventing TraceContextError regressions.

✅ Verification: The script was validated on the Tiny Shakespeare dataset for 20 steps, achieving stable convergence under DP constraints (Default: CLIP_NORM=1.0).

Screenshot of output attached 👇

…as per Paper 4

amyssnippet · 2026-01-24T17:01:07Z

examples/user_level_transformer_example.py

+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law- agreed to in writing, software


amyssnippet · 2026-01-24T17:02:59Z

examples/dp_sgd_transformer_nnx.py

+  Returns:
+    The content of the downloaded file as a string.
+  """
+  with urllib.request.urlopen(url) as response:


add timeout to prevent indefinite blocking

That's a good catch brother. i have now added a timeout and is definitely best practice to avoid hangs in CI/CD. I've updated download_data to include a 10-second timeout. I'm also moving the flax dependency into a proper requirements file as you suggested.

amyssnippet · 2026-01-25T07:17:56Z

examples/dp_sgd_transformer_nnx.py

 import urllib.request

-from flax import nnx
+from flax import nnx  # pytype: disable=import-error


this line is unusual

No, it's not, in the cicd checks there is no flax installing dependency to when the pytype check happens, the code fails. Hence, this line is important to pass all the cicd checks.
For a long term note, we can tell the @RamSaw or @ryan112358 to add flax installing for the cicd check for no further issue.

so try adding in the requirements txt which is located in the docs folder

The requirements.txt in docs folder is intended to only contain requirements needed for documentation. The ones listed in pyproject.toml are only those needed by the core library. Probably the best thing to do is add an additional requirements.txt to the examples/ directory that includes flax, and updates .github/workflows/ci.yml to install these.

Or you can add it to the "dev" requirements in pyproject.toml

amyssnippet · 2026-01-25T07:18:16Z

examples/user_level_transformer_example.py

 from absl import app
 from absl import flags
-import flax.linen as nn
+import flax.linen as nn  # pytype: disable=import-error


same here too

No, it's not, in the cicd checks there is no flax installing dependency to when the pytype check happens, the code fails. Hence, this line is important to pass all the cicd checks.
For a long term note, we can tell the @RamSaw or @ryan112358 to add flax installing for the cicd check for no further issue.

ryan112358

Looks great ,very clean - nice work! Left some comments

ryan112358 · 2026-01-25T17:17:36Z

examples/dp_sgd_transformer_nnx.py

 import urllib.request

-from flax import nnx
+from flax import nnx  # pytype: disable=import-error


The requirements.txt in docs folder is intended to only contain requirements needed for documentation. The ones listed in pyproject.toml are only those needed by the core library. Probably the best thing to do is add an additional requirements.txt to the examples/ directory that includes flax, and updates .github/workflows/ci.yml to install these.

ryan112358 · 2026-01-25T17:19:46Z

examples/dp_sgd_transformer_nnx.py

+    x: Input batch (single example or microbatch).
+    y: Target batch (single example or microbatch).
+    graphdef: The static graph definition of the NNX model.
+    other: Non-trainable state (e.g., RNG counts).


What else other than the rng counts is captured here? Is it possible to call this argument prng and have it typed as a jax.Array, then somehow wire it through to flax? I ask because when you call clipped_grad, if the loss function contains a prng key it needs special handling.

ryan112358 · 2026-01-25T17:20:10Z

examples/dp_sgd_transformer_nnx.py

+  Returns:
+    The scalar loss value.
+  """
+  m = nnx.merge(graphdef, params, other)


Give this a descriptive name like model

ryan112358 · 2026-01-25T17:22:27Z

examples/dp_sgd_transformer_nnx.py

+      l2_clip_norm=CLIP_NORM,
+      batch_argnums=(1, 2),  # x and y are batched
+      keep_batch_dim=False,  # Process per-example
+      return_values=True     # Return loss values for logging


You might need to pass prng_argnum here as well to ensure the random key is handled appropriately. But it might require slight refactoring of your loss function

ryan112358 · 2026-01-25T17:23:36Z

examples/dp_sgd_transformer_nnx.py

+      functools.partial(pure_loss_fn, graphdef=graphdef, other=other),
+      l2_clip_norm=CLIP_NORM,
+      batch_argnums=(1, 2),  # x and y are batched
+      keep_batch_dim=False,  # Process per-example


Usually we want to keep this to the default (True), unless we're doing user-level DP. If you set this to True (or remove it), can you remove the line that adds an extra batch axis in pure_loss_fn?

ryan112358 · 2026-01-25T17:24:28Z

examples/dp_sgd_transformer_nnx.py

+    grads, loss = grad_fn(params, x, y)
+
+    # Aggregate gradients (mean across batch)
+    mean_grads = jax.tree.map(lambda g: jnp.mean(g, axis=0), grads)


grad_fn already aggregates gradients across the batch dimension, so I think this is a bug

ryan112358 · 2026-01-25T17:24:59Z

examples/dp_sgd_transformer_nnx.py

+    # Aggregate gradients (mean across batch)
+    mean_grads = jax.tree.map(lambda g: jnp.mean(g, axis=0), grads)
+
+    # Add Privacy Noise


I'll leave it up to your discretion, but I think these inline comments can be removed.

ryan112358 · 2026-01-25T17:27:07Z

examples/dp_sgd_transformer_nnx.py

+  # Training loop
+  print(f"Training for {NUM_STEPS} steps...")
+  for step in range(NUM_STEPS):
+    batch = get_batch(data, BATCH_SIZE, CONTEXT_LENGTH)


In an ideal world this would use poisson sampling / jax_privacy.batch_selection. It's fine to leave a TODO for now and add it in a follow-up

ryan112358 · 2026-01-25T17:28:48Z

examples/dp_sgd_transformer_nnx.py

+  )
+
+  privatizer = noise_addition.gaussian_privatizer(
+      stddev=CLIP_NORM,


The stddev should be grad_fn.sensitiivty() * noise_multiplier. can you add NOISE_MULTIPLIER to the list of constants above?

debanganghosh08 · 2026-01-26T11:42:26Z

Hi @ryan112358 ,

I've pushed an update addressing all your feedback. Here is a summary of the changes I made:

CI/CD Infrastructure: Moved the flax dependency to examples/requirements.txt and updated .github/workflows/ci.yml. This ensures all examples pass pytype without manual disable comments.
NNX Causal Masking: Refactored TransformerBlock to use nnx.make_causal_mask(x[..., 0]).
I explored the is_causal keyword, but as noted, it isn't currently supported in the nnx.MultiHeadAttention version we are using. This new approach handles the rank requirements cleanly.
Gradient Aggregation Fix: Set keep_batch_dim=True in clipped_grad and removed the manual jnp.mean aggregation in the training step to prevent double-averaging.
Privacy Parameters: Integrated the NOISE_MULTIPLIER constant and updated the privatizer to scale based on grad_fn.sensitivity().
Refinement: I renamed internal variables for clarity (e.g., model instead of m), added a timeout to the data loader, and included a TODO for moving to Poisson sampling.

✅ Verification: The script was verified for 10 steps locally, achieving a stable loss and passing a 10.00/10 pylint check.

Remind me if new changes are required!

amyssnippet · 2026-01-26T11:55:52Z

#128 might fix the ci failures easy to debug

debanganghosh08 · 2026-01-26T13:58:57Z

#128 might fix the ci failures easy to debug

That's an Good approach for moving current CICD to modular DAG architecture. It is good for improving DX.

amyssnippet · 2026-01-27T04:06:40Z

@debanganghosh08 , since now the new ci pipeline and new dependency flow has been introduced, so there will ci failures from now on. As you have added the one lib in examples/req...txt it will not considered from now on. Kindly first pull the lastest changes from upstream main, then delete the examples/req..txt file and add the deps to the pyproject.toml, you can see there is optional tab and a space for [examples], kindly add it there.

Now a central optional deps are managed at the root pyproject.toml file

…eraging

…alse) per maintainer review

debanganghosh08 · 2026-01-27T11:05:33Z

@debanganghosh08 , since now the new ci pipeline and new dependency flow has been introduced, so there will ci failures from now on. As you have added the one lib in examples/req...txt it will not considered from now on. Kindly first pull the lastest changes from upstream main, then delete the examples/req..txt file and add the deps to the pyproject.toml, you can see there is optional tab and a space for [examples], kindly add it there.

Now a central optional deps are managed at the root pyproject.toml file

Thanks for the heads-up and the clear guidance on the new dependency flow, @amyssnippet! I've just pushed an update aligning with the new modular CI. I pulled the latest upstream changes, migrated flax to the [project.optional-dependencies] section in pyproject.toml, and cleaned up the temporary requirements file. Everything should be in sync now!

Implemented Deplayed Preconditioners with alternating-phase protocol …

e7b5538

…as per Paper 4

debanganghosh08 force-pushed the feat/nnx-transformer-dp-sgd branch from 7cbfbb1 to 944df7c Compare January 24, 2026 14:49

amyssnippet suggested changes Jan 24, 2026

View reviewed changes

amyssnippet suggested changes Jan 25, 2026

View reviewed changes

ryan112358 requested changes Jan 25, 2026

View reviewed changes

debanganghosh08 force-pushed the feat/nnx-transformer-dp-sgd branch from 1d03537 to 9eac33d Compare January 26, 2026 11:35

debanganghosh08 mentioned this pull request Jan 26, 2026

fixed ci workflows #128

Merged

debanganghosh08 added 8 commits January 27, 2026 16:28

Implement User-Level Sampling (ULS) for Transformers with per-user av…

a2acc2f

…eraging

Refactor: Use UserSelectionStrategy and clipped_grad(keep_batch_dim=F…

50c04ad

…alse) per maintainer review

style: fix remaining pylint R0917 and whitespace violations

92b5771

style: fix line length in user_level_transformer and sync nnx example

b9f3a81

style: fix pytype import errors and finalize production standards

d2f8831

Refactor transformer DP-SGD example and update CI workflow

f5beecf

Add requirements file for transformer examples to fix CI dependencies

23e7eb7

chore: align dependencies with new modular CI in pyproject.toml

d5a7943

debanganghosh08 force-pushed the feat/nnx-transformer-dp-sgd branch from b6d6d66 to d5a7943 Compare January 27, 2026 11:02

Feat : Add DP-SGD Transformer example using Flax NNX API | Issue #120 #126

Are you sure you want to change the base?

Feat : Add DP-SGD Transformer example using Flax NNX API | Issue #120 #126

Uh oh!

Conversation

debanganghosh08 commented Jan 24, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryan112358 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

debanganghosh08 commented Jan 26, 2026

Uh oh!

amyssnippet commented Jan 26, 2026

Uh oh!

debanganghosh08 commented Jan 26, 2026

Uh oh!

amyssnippet commented Jan 27, 2026

Uh oh!

debanganghosh08 commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants