@@ -25,8 +25,15 @@ started:
25
25
26
26
To install PyTorch/XLA stable build in a new TPU VM:
27
27
28
- ```
29
- pip install torch~=2.5.0 torch_xla[tpu]~=2.5.0 -f https://storage.googleapis.com/libtpu-releases/index.html -f https://storage.googleapis.com/libtpu-wheels/index.html
28
+ ``` sh
29
+ pip install torch~=2.6.0 ' torch_xla[tpu]~=2.6.0' \
30
+ -f https://storage.googleapis.com/libtpu-releases/index.html \
31
+ -f https://storage.googleapis.com/libtpu-wheels/index.html
32
+
33
+ # Optional: if you're using custom kernels, install pallas dependencies
34
+ pip install ' torch_xla[pallas]' \
35
+ -f https://storage.googleapis.com/jax-releases/jax_nightly_releases.html \
36
+ -f https://storage.googleapis.com/jax-releases/jaxlib_nightly_releases.html
30
37
```
31
38
32
39
To install PyTorch/XLA nightly build in a new TPU VM:
@@ -36,6 +43,36 @@ pip3 install --pre torch torchvision --index-url https://download.pytorch.org/wh
36
43
pip install 'torch_xla[tpu] @ https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.6.0.dev-cp310-cp310-linux_x86_64.whl' -f https://storage.googleapis.com/libtpu-releases/index.html -f https://storage.googleapis.com/libtpu-wheels/index.html
37
44
```
38
45
46
+ ### C++11 ABI builds
47
+
48
+ Starting from Pytorch/XLA 2.6, we'll provide wheels and docker images built with
49
+ two C++ ABI flavors: C++11 and pre-C++11. Pre-C++11 is the default to align with
50
+ PyTorch upstream, but C++11 ABI wheels and docker images have better lazy tensor
51
+ tracing performance.
52
+
53
+ To install C++11 ABI flavored 2.6 wheels:
54
+
55
+ ``` sh
56
+ pip install torch==2.6.0+cpu.cxx11.abi torch_xla[tpu]==2.6.0+cxx11 \
57
+ -f https://storage.googleapis.com/libtpu-releases/index.html \
58
+ -f https://storage.googleapis.com/libtpu-wheels/index.html \
59
+ -f https://download.pytorch.org/whl/torch
60
+ ```
61
+
62
+ To access C++11 ABI flavored docker image:
63
+
64
+ ```
65
+ us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.6.0_3.10_tpuvm_cxx11
66
+ ```
67
+
68
+ If your model is tracing bound (e.g. you see that the host CPU is busy tracing
69
+ the model while TPUs are idle), switching to the C++11 ABI wheels/docker images
70
+ can improve performance. Mixtral 8x7B benchmarking results on v5p-256, global
71
+ batch size 1024:
72
+
73
+ - Pre-C++11 ABI MFU: 33%
74
+ - C++ ABI MFU: 39%
75
+
39
76
### GPU Plugin
40
77
41
78
PyTorch/XLA now provides GPU support through a plugin package similar to ` libtpu ` :
@@ -44,6 +81,13 @@ PyTorch/XLA now provides GPU support through a plugin package similar to `libtpu
44
81
pip install torch~=2.5.0 torch_xla~=2.5.0 https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla_cuda_plugin-2.5.0-py3-none-any.whl
45
82
```
46
83
84
+ The newest stable version where PyTorch/XLA: GPU wheel is available is ` torch_xla `
85
+ 2.5. We do not offer a PyTorch/XLA: GPU wheel in the PyTorch/XLA 2.6 release. We
86
+ understand this is important and plan to [ reinstate GPU support] ( https://github.com/pytorch/xla/issues/8577 ) by the 2.7 release.
87
+ PyTorch/XLA remains an open-source project and we welcome contributions from the
88
+ community to help maintain and improve the project. To contribute, please start
89
+ with the [ contributors guide] ( https://github.com/pytorch/xla/blob/master/CONTRIBUTING.md ) .
90
+
47
91
## Getting Started
48
92
49
93
To update your existing training loop, make the following changes:
@@ -224,6 +268,7 @@ The torch wheel version `2.6.0.dev20241028+cpu.cxx11.abi` can be found at https:
224
268
225
269
| Version | Cloud TPU VMs Wheel |
226
270
| ---------| -------------------|
271
+ | 2.5 (Python 3.10) | ` https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.5.0-cp310-cp310-manylinux_2_28_x86_64.whl ` |
227
272
| 2.4 (Python 3.10) | ` https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.4.0-cp310-cp310-manylinux_2_28_x86_64.whl ` |
228
273
| 2.3 (Python 3.10) | ` https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.3.0-cp310-cp310-manylinux_2_28_x86_64.whl ` |
229
274
| 2.2 (Python 3.10) | ` https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.2.0-cp310-cp310-manylinux_2_28_x86_64.whl ` |
@@ -257,6 +302,7 @@ The torch wheel version `2.6.0.dev20241028+cpu.cxx11.abi` can be found at https:
257
302
258
303
| Version | Cloud TPU VMs Docker |
259
304
| --- | ----------- |
305
+ | 2.6 | ` us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.6.0_3.10_tpuvm ` |
260
306
| 2.5 | ` us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.5.0_3.10_tpuvm ` |
261
307
| 2.4 | ` us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.4.0_3.10_tpuvm ` |
262
308
| 2.3 | ` us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.3.0_3.10_tpuvm ` |
0 commit comments