[Jax API update] Remove `jax_spmd_mode` #1136

Steboss · 2025-04-28T17:13:15Z

This PR is a continuation of #1106, and it refers to issue #1126
In particular here the major change is in axlearn/common/utils_spmd.py, where the config doesn't need to be updated anymore, as jax_spmd_mode has been removed in JAX 0.6.0 (here)

Co-authored-by: Ruoming Pang <[email protected]>

axlearn/common/utils_spmd.py

axlearn/common/test_utils.py

axlearn/common/utils.py

…f9869d5

Steboss · 2025-04-29T11:31:19Z

@apghml I've rebased the PR with the current main :)

axlearn/common/utils.py

Co-authored-by: apghml <[email protected]>

axlearn/common/utils.py

Steboss · 2025-05-02T16:31:22Z

hey @apghml
So I investigated over the pytree_children function, so that it could match exactly the previous implementation. For simplicity, i'll copy and paste here a part of the initial output only. This is the current implementation

I0502 06:21:33.000848 140737350427392 utils.py:1878] ((GetAttrKey(name='output_collection'), OutputCollection(summaries={}, state_updates={}, module_outputs={})), (GetAttrKey(name='parent'), None), (GetAttrKey(name='prng_key'), Traced<ShapedArray(uint32[4])>with<DynamicJaxprTrace>), (GetAttrKey(name='state'), {'optimizer': (EmptyState(), (ScaleByAdamState(count=Traced<ShapedArray(int32[])>with<DynamicJaxprTrace>, mu={'decoder': {'emb': {'dropout': {}, 'token_emb': {'weight': Traced<ShapedArray(float32[131072,3072])>with<DynamicJaxprTrace>}}, 'output_dropout': {}, 'output_norm': {'scale': Traced<ShapedArray(float32[3072])>with<DynamicJaxprTrace>}, 'transformer': {'repeat': VDict({'layer': {'feed_forward': {'dropout1': {}, 'dropout2': {}, 'linear1_0': {'weight': Traced<ShapedArray(float32[28,3072,8192])>with<DynamicJaxprTrace>}, 'linear1_1': {'weight': Traced<ShapedArray(float32[28,3072,8192])>with<DynamicJaxprTrace>}, 'linear2': {'weight': Traced<ShapedArray(float32[28,8192,3072])>with<DynamicJaxprTrace>}, 'norm': {'scale': Traced<ShapedArray(float32[28,3072])>with<DynamicJaxprTrace>}, 'stochastic_depth': {}}, 'self_attention': {'attention': {'dropout': {}, 'i_proj': {'i_proj': {'qkv_proj': {'weight': Traced<ShapedArray(float32[28,3072,40,128])>with<DynamicJaxprTrace>}}, 'rope_pos_emb_layer': {}}, 'kv_cache': {}, 'o_proj': {'weight': Traced<ShapedArray(float32[28,3072,24,128])>with<DynamicJaxprTrace>}, 'scale_key': {}, 'scale_query': {}}, 'dropout': {}, 'norm': {'scale': Traced<ShapedArray(float32[28,3072])>with<DynamicJaxprTrace>}, 'stochastic_depth': {}}}})}}, 'metrics': {'aux': {}, 'lm': {}}}, nu={'decoder': {'emb': {'dropout': {}, 'token_emb': {'weight': Traced<ShapedArray(float32[131072,3072])>with<DynamicJaxprTrace>}}, 'output_dropout': {}, 'output_norm': {'scale': Traced<ShapedArray(float32[3072])>with<DynamicJaxprTrace>}, 'transformer': {'repeat': VDict({'layer': {'feed_forward': {'dropout1': {}, 'dropout2': {}, 'linear1_0': {'weight': Traced<ShapedArray(float32[28,3072,8192])>with<DynamicJaxprTrace>}, 'linear1_1': {'weight': Traced<ShapedArray(float32[28,3072,8192])>with<DynamicJaxprTrace>}, 'linear2': {'weight': Traced<ShapedArray(float32[28,8192,3072])>with<DynamicJaxprTrace>}, 'norm': {'scale': Traced<ShapedArray(float32[28,3072])>with<DynamicJaxprTrace>}, 'stochastic_depth': {}}, 'self_attention': {'attention': {'dropout': {}, 'i_proj': {'i_proj': {'qkv_proj': {'weight': Traced<ShapedArray(float32[28,3072,40,128])>with<DynamicJaxprTrace>}}, 'rope_pos_emb_layer': {}}, 'kv_cache': {}, 'o_proj': {'weight': Traced<ShapedArray(float32[28,3072,24,128])>with<DynamicJaxprTrace>}, 'scale_key': {}, 'scale_query': {}}, 'dropout': {}, 'norm': {'scale': Traced<ShapedArray(float32[28,3072])>with<DynamicJaxprTrace>}, 'stochastic_depth': {}}}})}}, 'metrics': {'aux': {}, 'lm': {}}}), ScaleByScheduleState(count=Traced<ShapedArray(int32[])>with<DynamicJaxprTrace>), AddDecayedWeightsState(count=None), ScaleByScheduleState(count=Traced<ShapedArray(int32[])>with<DynamicJaxprTrace>), EmptyState()))}))


I0502 06:21:33.000930 140737350427392 utils.py:1888] [(GetAttrKey(name='summaries'), {}), (GetAttrKey(name='state_updates'), {}), (GetAttrKey(name='module_outputs'), {})]

and this is the output of the new implementation

I0502 09:17:33.186500 140737350427392 utils.py:1893] [(GetAttrKey(name='output_collection'), OutputCollection(summaries={}, state_updates={}, module_outputs={})), (GetAttrKey(name='parent'), None), (GetAttrKey(name='prng_key'), Traced<uint32[4]>with<DynamicJaxprTrace>), (GetAttrKey(name='state'), {'optimizer': (EmptyState(), (ScaleByAdamState(count=Traced<int32[]>with<DynamicJaxprTrace>, mu={'decoder': {'emb': {'dropout': {}, 'token_emb': {'weight': Traced<float32[131072,3072]>with<DynamicJaxprTrace>}}, 'output_dropout': {}, 'output_norm': {'scale': Traced<float32[3072]>with<DynamicJaxprTrace>}, 'transformer': {'repeat': VDict({'layer': {'feed_forward': {'dropout1': {}, 'dropout2': {}, 'linear1_0': {'weight': Traced<float32[28,3072,8192]>with<DynamicJaxprTrace>}, 'linear1_1': {'weight': Traced<float32[28,3072,8192]>with<DynamicJaxprTrace>}, 'linear2': {'weight': Traced<float32[28,8192,3072]>with<DynamicJaxprTrace>}, 'norm': {'scale': Traced<float32[28,3072]>with<DynamicJaxprTrace>}, 'stochastic_depth': {}}, 'self_attention': {'attention': {'dropout': {}, 'i_proj': {'i_proj': {'qkv_proj': {'weight': Traced<float32[28,3072,40,128]>with<DynamicJaxprTrace>}}, 'rope_pos_emb_layer': {}}, 'kv_cache': {}, 'o_proj': {'weight': Traced<float32[28,3072,24,128]>with<DynamicJaxprTrace>}, 'scale_key': {}, 'scale_query': {}}, 'dropout': {}, 'norm': {'scale': Traced<float32[28,3072]>with<DynamicJaxprTrace>}, 'stochastic_depth': {}}}})}}, 'metrics': {'aux': {}, 'lm': {}}}, nu={'decoder': {'emb': {'dropout': {}, 'token_emb': {'weight': Traced<float32[131072,3072]>with<DynamicJaxprTrace>}}, 'output_dropout': {}, 'output_norm': {'scale': Traced<float32[3072]>with<DynamicJaxprTrace>}, 'transformer': {'repeat': VDict({'layer': {'feed_forward': {'dropout1': {}, 'dropout2': {}, 'linear1_0': {'weight': Traced<float32[28,3072,8192]>with<DynamicJaxprTrace>}, 'linear1_1': {'weight': Traced<float32[28,3072,8192]>with<DynamicJaxprTrace>}, 'linear2': {'weight': Traced<float32[28,8192,3072]>with<DynamicJaxprTrace>}, 'norm': {'scale': Traced<float32[28,3072]>with<DynamicJaxprTrace>}, 'stochastic_depth': {}}, 'self_attention': {'attention': {'dropout': {}, 'i_proj': {'i_proj': {'qkv_proj': {'weight': Traced<float32[28,3072,40,128]>with<DynamicJaxprTrace>}}, 'rope_pos_emb_layer': {}}, 'kv_cache': {}, 'o_proj': {'weight': Traced<float32[28,3072,24,128]>with<DynamicJaxprTrace>}, 'scale_key': {}, 'scale_query': {}}, 'dropout': {}, 'norm': {'scale': Traced<float32[28,3072]>with<DynamicJaxprTrace>}, 'stochastic_depth': {}}}})}}, 'metrics': {'aux': {}, 'lm': {}}}), ScaleByScheduleState(count=Traced<int32[]>with<DynamicJaxprTrace>), AddDecayedWeightsState(count=None), ScaleByScheduleState(count=Traced<int32[]>with<DynamicJaxprTrace>), EmptyState()))})]

I0502 09:17:33.186563 140737350427392 utils.py:1876] [(GetAttrKey(name='summaries'), {}), (GetAttrKey(name='state_updates'), {}), (GetAttrKey(name='module_outputs'), {})]

I tested by running fuji-3B-v3-flash, c4 dataset, ICI FSDP=8, global batch size 16, sequence length 4096. The only difference in the output is the print for Traced<ShapedArray(int32[])> that, now, with JAX is Traced<int32[]> - so nothing to be worried about.

I've updated the test suite, so that we have:

a custom object with fields _fields, like OutputCollection(summaries={}, state_updates={}, module_outputs={})) that we see as input node
and a test with object():

obj = object()
self.assertSequenceEqual(pytree_children(obj), [])

The test on pytree_children works fine.

Finally, I checked the performance, related to the removal of spmd_mode, and the overall results are the same:

Metrics	This PR implementation	Previous AXLearn implementation
Tokens per sec per gpu	9288	8904
Seqs per sec per gpu	2.26	2.17
Average time step	0.88	0.91
TFLOPS per sec per GPU	218.80	209.74

apghml · 2025-05-02T20:48:03Z

axlearn/common/utils_test.py

+
+        tree = CustomWithFields(**original_tree)
+        self.assertSequenceEqual(
+            pytree_children(tree),


Hm..., this seems to not match the behavior of jax.tree.leaves()?

import jax class CustomWithFields: _fields = ("a", "b", "c") def __init__(self, a, b, c): self.a = a self.b = b self.c = c print(jax.tree.leaves(CustomWithFields(1,2,3))) # Prints the entire object as a single leaf.

apghml · 2025-05-02T20:49:16Z

axlearn/common/utils.py

-        key_children, _ = key_handler.flatten_with_keys(node)
-        return key_children
+    # If node is a NamedTuple
+    if isinstance(node, tuple) and hasattr(node, "_fields"):


Do we need this? It looks like the output is already correct?

from typing import NamedTuple import jax class C(NamedTuple): a: int print(jax.tree_util.default_registry.flatten_one_level_with_keys(C(1)))

Also why did we move this earlier in the function body? IIUC, this will change the behavior in the case that someone creates a NamedTuple subclass and registers a custom tree flattening handler for it?

apghml · 2025-05-02T20:52:36Z

axlearn/common/utils.py

-        # Handle namedtuple as a special case, based on heuristic.
-        return [(jax.tree_util.GetAttrKey(s), getattr(node, s)) for s in node._fields]
-    return [(jax.tree_util.FlattenedIndexKey(i), c) for i, c in enumerate(flat[0])]
+    return [(jax.tree_util.FlattenedIndexKey(i), child) for i, child in enumerate(flat[0])]


Overall I'm a bit nervous that the various changes in this function may cause us to deviate from what jax does, even if it has the same behavior in existing axlearn code. Was there a reason we needed to change this function?

Alternatively, would it work to do something like:

jax.tree_util.tree_map_with_path(lambda *args: args, node, is_leaf=lambda x: x is not node)

and then pull the k,v pairs out of the result of that?

apghml · 2025-05-02T20:54:09Z

axlearn/common/utils.py

+        key_child_pairs, _ = default_registry.flatten_one_level_with_keys(node)
+        if key_child_pairs:
+            return list(key_child_pairs)
+    except (ValueError, TypeError):


Could you explain why we still need to catch these errors?

Steboss · 2025-05-02T21:05:12Z

@apghml thanks for your comments.
let's to do in this way:

i'll create a PR only for jax_spmd_mode removal
and i'll create a new PR for utils, so I'll check if we really need to restructure the utils.py pytree_children function completely, or if JAX can give us some alternatives
What do you think?

apghml · 2025-05-02T21:15:02Z

axlearn/common/utils.py

@@ -1853,35 +1853,38 @@ def thread_stack_traces() -> Sequence[Sequence[str]]:
    return grouped_lines


-def pytree_children(node: Any) -> Sequence[tuple[KeyEntry, Any]]:
+def pytree_children(node: Any) -> list[tuple[KeyEntry, Any]]:


Does this example from the jax docs fail with the new implementation?

import jax.numpy as jnp import jax.tree from jax.tree_util import GetAttrKey class MyContainer: def __init__(self): self.x = jnp.zeros(2) self.y = jnp.ones(2) def flatten_with_keys(obj): children = [(GetAttrKey('x'), obj.x), (GetAttrKey('y'), obj.y)] aux_data = () # aux_data must contain static, hashable data. return children, aux_data def unflatten(aux_data, children): obj = object.__new__(MyContainer) obj.x, obj.y = children obj.size, = aux_data return obj jax.tree_util.register_pytree_node(MyContainer, flatten_with_keys, unflatten) pytree_children(MyContainer())

apghml

and i'll create a new PR for utils, so I'll check if we really need to restructure the utils.py pytree_children function completely, or if JAX can give us some alternatives
What do you think?

Sound good.

Steboss and others added 6 commits April 1, 2025 16:29

modify primitives

835f0bd

fix pre-commit error

dd380a4

update jax.tree_map to jax.tree_util.tree_map

5f1640d

Update axlearn/common/array_serialization.py

deb1da9

Co-authored-by: Ruoming Pang <[email protected]>

fix black formatting

4784822

fix tree_map

e1828ee

Steboss requested review from ruomingp, markblee and a team as code owners April 28, 2025 17:13

Steboss mentioned this pull request Apr 28, 2025

test tree_util changes NVIDIA/JAX-Toolbox#1396

Merged

apghml reviewed Apr 28, 2025

View reviewed changes

axlearn/common/utils_spmd.py Show resolved Hide resolved

axlearn/common/test_utils.py Outdated Show resolved Hide resolved

axlearn/common/utils.py Outdated Show resolved Hide resolved

Steboss added 8 commits April 29, 2025 11:40

update main branch

87c23a0

conflicts to fix

9d83d7c

rebase with main to solve conflicts

a089d43

modify R0917

13a8451

modify R0917

8bc6f64

Fix comments

16e4c04

use flatten_one_level_with_keys

9316c5c

the spmd has been removed in commit 7634230cdcd2d3cb42d1093f6ab255f47…

8de926a

…f9869d5

Steboss force-pushed the sbosisio/tree_util branch from 52c92e4 to 8de926a Compare April 29, 2025 11:22

Steboss added 2 commits April 29, 2025 12:25

fix the array_serialization

c3a46d4

fix conflicts for optimizers.py

0ddb457

apghml reviewed Apr 29, 2025

View reviewed changes

axlearn/common/utils.py Outdated Show resolved Hide resolved

axlearn/common/utils.py Outdated Show resolved Hide resolved

axlearn/common/utils.py Show resolved Hide resolved

Steboss and others added 4 commits April 29, 2025 17:59

Update axlearn/common/utils.py

835ab56

Co-authored-by: apghml <[email protected]>

check pylint

2c56be8

fix docstring

d2774c3

use public API

7e9ca87

apghml reviewed Apr 30, 2025

View reviewed changes

axlearn/common/utils.py Outdated Show resolved Hide resolved

fix tryexcept

c860d7f

make sure pytree contains all the previous cases + fix tests

0605432

apghml reviewed May 2, 2025

View reviewed changes

yhtang mentioned this pull request May 5, 2025

Remove obsolete jax_spmd_mode config #1156

Closed

Steboss closed this May 6, 2025

Steboss mentioned this pull request May 6, 2025

[JAX API Update] Remove jax_spmd_mode from config #1160

Closed

apghml mentioned this pull request May 6, 2025

[JAX API UPDATE] Update utils.py, as _registry_with_keypaths was removed #1161

Closed

Steboss deleted the sbosisio/tree_util branch May 21, 2025 15:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Jax API update] Remove `jax_spmd_mode` #1136

[Jax API update] Remove `jax_spmd_mode` #1136

Uh oh!

Steboss commented Apr 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Steboss commented Apr 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Steboss commented May 2, 2025

Uh oh!

apghml May 2, 2025

Uh oh!

apghml May 2, 2025

Uh oh!

apghml May 2, 2025

Uh oh!

apghml May 2, 2025 •

edited

Loading

Uh oh!

apghml May 2, 2025

Uh oh!

Steboss commented May 2, 2025

Uh oh!

apghml May 2, 2025 •

edited

Loading

Uh oh!

apghml left a comment

Uh oh!

Uh oh!

[Jax API update] Remove jax_spmd_mode #1136

[Jax API update] Remove jax_spmd_mode #1136

Uh oh!

Conversation

Steboss commented Apr 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Steboss commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Steboss commented May 2, 2025

Uh oh!

apghml May 2, 2025

Choose a reason for hiding this comment

Uh oh!

apghml May 2, 2025

Choose a reason for hiding this comment

Uh oh!

apghml May 2, 2025

Choose a reason for hiding this comment

Uh oh!

apghml May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

apghml May 2, 2025

Choose a reason for hiding this comment

Uh oh!

Steboss commented May 2, 2025

Uh oh!

apghml May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

apghml left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

[Jax API update] Remove `jax_spmd_mode` #1136

[Jax API update] Remove `jax_spmd_mode` #1136

Steboss commented Apr 29, 2025 •

edited

Loading

apghml May 2, 2025 •

edited

Loading

apghml May 2, 2025 •

edited

Loading