`pmap` state #2121

yuanqing-wang · 2022-05-12T20:31:30Z

yuanqing-wang
May 12, 2022

How should I broadcast a training state to multiple devices and pmap? I tried to follow this example and had:

import jax
import jax.numpy as jnp
import jax.tools.colab_tpu
jax.tools.colab_tpu.setup_tpu()

from flax import linen as nn
import optax
model = nn.Dense(1)
x = jnp.ones(8)
params = model.init(jax.random.PRNGKey(2666), x)
from flax.training.train_state import TrainState
tx = optax.adam(learning_rate=1e-3)
state = TrainState.create(
    apply_fn=model.apply, params=params, tx=tx,
)
def loss_fn(state, x):
    return (model.apply(state.params, x) ** 2.0).mean()
jax.pmap(loss_fn)(state, x)

But got

ValueError: pmap was requested to map its argument along axis 0, which implies that its rank should be at least 1, but is only 0 (its shape is ())

Answered by marcvanzee

May 13, 2022

You should use flax.jax_utils.replicate. Also, it is safer to use jax.device_count() rather than hardcoding 8 in your array. Finally, the Dense layer expects at least two dimensions. This code should work:

import jax
import jax.numpy as jnp

from flax import linen as nn
from flax import jax_utils
import optax
from flax.training.train_state import TrainState

model = nn.Dense(1)
x = jnp.ones((jax.device_count(), 3))
params = model.init(jax.random.PRNGKey(0), x)
tx = optax.adam(learning_rate=1e-3)
state = TrainState.create(
    apply_fn=model.apply, params=params, tx=tx,
)
state = jax_utils.replicate(state)

def loss_fn(state, x):
    return (model.apply(state.params, x) ** 2.0).mean()

jax.p…

View full answer

marcvanzee · 2022-05-13T06:35:10Z

marcvanzee
May 13, 2022
Maintainer

You should use flax.jax_utils.replicate. Also, it is safer to use jax.device_count() rather than hardcoding 8 in your array. Finally, the Dense layer expects at least two dimensions. This code should work:

import jax
import jax.numpy as jnp

from flax import linen as nn
from flax import jax_utils
import optax
from flax.training.train_state import TrainState

model = nn.Dense(1)
x = jnp.ones((jax.device_count(), 3))
params = model.init(jax.random.PRNGKey(0), x)
tx = optax.adam(learning_rate=1e-3)
state = TrainState.create(
    apply_fn=model.apply, params=params, tx=tx,
)
state = jax_utils.replicate(state)

def loss_fn(state, x):
    return (model.apply(state.params, x) ** 2.0).mean()

jax.pmap(loss_fn)(state, x)

3 replies

zhenlan0426 Jul 14, 2023

@marcvanzee do you have an example for which the model involve batch_stats, pmap would need to update the states properly

cgarciae Jul 14, 2023
Maintainer

@zhenlan0426 here is a colab that uses pmap with BatchNorm:

https://colab.research.google.com/drive/1hXns2b6T8T393zSrKCSoUktye1YlSe8U?usp=sharing

zhenlan0426 Jul 14, 2023

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`pmap` state #2121

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

pmap state #2121

yuanqing-wang May 12, 2022

Replies: 1 comment · 3 replies

marcvanzee May 13, 2022 Maintainer

zhenlan0426 Jul 14, 2023

cgarciae Jul 14, 2023 Maintainer

zhenlan0426 Jul 14, 2023

`pmap` state #2121

yuanqing-wang
May 12, 2022

Replies: 1 comment 3 replies

marcvanzee
May 13, 2022
Maintainer

cgarciae Jul 14, 2023
Maintainer