Replace uses of deprecated JAX sharding APIs with their new names in jax.sharding.

hawkinsp · Flax Authors · commit 398e170b1e14 · 2023-02-03T14:30:02.000-08:00
This change updates:
* {jax.experimental.maps.Mesh, jax.interpreters.pxla.Mesh} to jax.sharding.Mesh
* {jax.experimental.PartitionSpec, jax.experimental.pjit.PartitionSpec, jax.interpreters.pxla.PartitionSpec, jax.pxla.PartitionSpec} to jax.sharding.PartitionSpec
* jax.experimental.maps.NamedSharding to jax.sharding.NamedSharding.

PiperOrigin-RevId: 506995236
diff --git a/docs/guides/flax_on_pjit.ipynb b/docs/guides/flax_on_pjit.ipynb
@@ -286,7 +286,7 @@
    "source": [
     "## Specify sharding (includes initialization and `TrainState` creation)\n",
     "\n",
-    "Next, generate the [`jax.experimental.pjit.PartitionSpec`](https://jax.readthedocs.io/en/latest/jax-101/08-pjit.html?#more-information-on-partitionspec) that `pjit` should receive as annotations of _input_ and _output_ data. `PartitionSpec` is a tuple of 2 axes (in a 2x4 mesh). To learn more, refer to [JAX-101: Introduction to `pjit`](https://jax.readthedocs.io/en/latest/jax-101/08-pjit.html).\n",
+    "Next, generate the [`jax.sharding.PartitionSpec`](https://jax.readthedocs.io/en/latest/jax-101/08-pjit.html?#more-information-on-partitionspec) that `pjit` should receive as annotations of _input_ and _output_ data. `PartitionSpec` is a tuple of 2 axes (in a 2x4 mesh). To learn more, refer to [JAX-101: Introduction to `pjit`](https://jax.readthedocs.io/en/latest/jax-101/08-pjit.html).\n",
     "\n",
     "### Specify the input\n",
     "\n",
@@ -416,7 +416,7 @@
     "\n",
     "Now you can apply JAX [`pjit`](https://jax.readthedocs.io/en/latest/jax.experimental.pjit.html#module-jax.experimental.pjit) to your `init_fn` in a similar fashion as [`jax.jit`](https://jax.readthedocs.io/en/latest/jax-101/02-jitting.html) but with two extra arguments: `in_axis_resources` and `out_axis_resources`.\n",
     "\n",
-    "You need to add a `with mesh:` context when running a `pjit`ted function, so that it can refer to `mesh` (an instance of `jax.experimental.maps.Mesh`) to allocate data on devices correctly."
+    "You need to add a `with mesh:` context when running a `pjit`ted function, so that it can refer to `mesh` (an instance of `jax.sharding.Mesh`) to allocate data on devices correctly."
    ]
   },
   {
diff --git a/docs/guides/flax_on_pjit.md b/docs/guides/flax_on_pjit.md
@@ -207,7 +207,7 @@ class MLP(nn.Module):
 
 ## Specify sharding (includes initialization and `TrainState` creation)
 
-Next, generate the [`jax.experimental.pjit.PartitionSpec`](https://jax.readthedocs.io/en/latest/jax-101/08-pjit.html?#more-information-on-partitionspec) that `pjit` should receive as annotations of _input_ and _output_ data. `PartitionSpec` is a tuple of 2 axes (in a 2x4 mesh). To learn more, refer to [JAX-101: Introduction to `pjit`](https://jax.readthedocs.io/en/latest/jax-101/08-pjit.html).
+Next, generate the [`jax.sharding.PartitionSpec`](https://jax.readthedocs.io/en/latest/jax-101/08-pjit.html?#more-information-on-partitionspec) that `pjit` should receive as annotations of _input_ and _output_ data. `PartitionSpec` is a tuple of 2 axes (in a 2x4 mesh). To learn more, refer to [JAX-101: Introduction to `pjit`](https://jax.readthedocs.io/en/latest/jax-101/08-pjit.html).
 
 ### Specify the input
 
@@ -275,7 +275,7 @@ state_spec
 
 Now you can apply JAX [`pjit`](https://jax.readthedocs.io/en/latest/jax.experimental.pjit.html#module-jax.experimental.pjit) to your `init_fn` in a similar fashion as [`jax.jit`](https://jax.readthedocs.io/en/latest/jax-101/02-jitting.html) but with two extra arguments: `in_axis_resources` and `out_axis_resources`.
 
-You need to add a `with mesh:` context when running a `pjit`ted function, so that it can refer to `mesh` (an instance of `jax.experimental.maps.Mesh`) to allocate data on devices correctly.
+You need to add a `with mesh:` context when running a `pjit`ted function, so that it can refer to `mesh` (an instance of `jax.sharding.Mesh`) to allocate data on devices correctly.
 
 ```{code-cell} ipython3
 :id: a298c5d03c0d
diff --git a/docs/guides/use_checkpointing.ipynb b/docs/guides/use_checkpointing.ipynb
@@ -518,7 +518,8 @@
    "outputs": [],
    "source": [
     "# Multi-host related imports.\n",
-    "from jax.experimental import maps, PartitionSpec, pjit"
+    "from jax.sharding import PartitionSpec\n",
+    "from jax.experimental import pjit"
    ]
   },
   {
@@ -531,14 +532,14 @@
     "# Create a multi-process array.\n",
     "mesh_shape = (4, 2)\n",
     "devices = np.asarray(jax.devices()).reshape(*mesh_shape)\n",
-    "mesh = maps.Mesh(devices, ('x', 'y'))\n",
+    "mesh = jax.sharding.Mesh(devices, ('x', 'y'))\n",
     "\n",
     "f = pjit.pjit(\n",
     "  lambda x: x,\n",
     "  in_axis_resources=None,\n",
     "  out_axis_resources=PartitionSpec('x', 'y'))\n",
     "\n",
-    "with maps.Mesh(mesh.devices, mesh.axis_names):\n",
+    "with jax.sharding.Mesh(mesh.devices, mesh.axis_names):\n",
     "    mp_array = f(np.arange(8 * 2).reshape(8, 2))\n",
     "\n",
     "# Make it a pytree as usual.\n",
@@ -619,7 +620,7 @@
     }
    ],
    "source": [
-    "with maps.Mesh(mesh.devices, mesh.axis_names):\n",
+    "with jax.sharding.Mesh(mesh.devices, mesh.axis_names):\n",
     "    mp_smaller_array = f(np.zeros(8).reshape(4, 2))\n",
     "\n",
     "mp_target = {'model': mp_smaller_array}\n",
diff --git a/docs/guides/use_checkpointing.md b/docs/guides/use_checkpointing.md
@@ -255,21 +255,22 @@ Unfortunately, Python Jupyter notebooks are single-host only and cannot activate
 
 ```python
 # Multi-host related imports.
-from jax.experimental import maps, PartitionSpec, pjit
+from jax.sharding import PartitionSpec
+from jax.experimental import pjit
 ```
 
 ```python
 # Create a multi-process array.
 mesh_shape = (4, 2)
 devices = np.asarray(jax.devices()).reshape(*mesh_shape)
-mesh = maps.Mesh(devices, ('x', 'y'))
+mesh = jax.sharding.Mesh(devices, ('x', 'y'))
 
 f = pjit.pjit(
   lambda x: x,
   in_axis_resources=None,
   out_axis_resources=PartitionSpec('x', 'y'))
 
-with maps.Mesh(mesh.devices, mesh.axis_names):
+with jax.sharding.Mesh(mesh.devices, mesh.axis_names):
     mp_array = f(np.arange(8 * 2).reshape(8, 2))
 
 # Make it a pytree as usual.
@@ -297,7 +298,7 @@ checkpoints.save_checkpoint_multiprocess(ckpt_dir,
 Note that, when using [`flax.training.checkpoints.restore_checkpoint`](https://flax.readthedocs.io/en/latest/api_reference/flax.training.html#flax.training.checkpoints.restore_checkpoint), you need to pass a `target` with valid multi-process arrays at the correct structural location. Flax only uses the `target` arrays' meshes and mesh axes to restore the checkpoint. This means that the multi-process array in the `target` arg doesn't have to be as large as your checkpoint's size (the shape of the multi-process array doesn't need to have the same shape as the actual array in your checkpoint).
 
 ```python
-with maps.Mesh(mesh.devices, mesh.axis_names):
+with jax.sharding.Mesh(mesh.devices, mesh.axis_names):
     mp_smaller_array = f(np.zeros(8).reshape(4, 2))
 
 mp_target = {'model': mp_smaller_array}