Add ROCm support #1112

akasharidas · 2025-04-16T17:55:47Z

Add a Transformer Engine wrapper to enable calling Flash Attention on ROCm.
Add a config and launch script to demonstrate training Fuji-v2-70B on a single MI300 node.

ruomingp

Thanks!

ruomingp · 2025-04-16T20:15:47Z

pyproject.toml

@@ -23,8 +23,8 @@ core = [
    "absl-py==2.1.0",
    "chex==0.1.88",
    "importlab==0.8.1",  # breaks pytype on 0.8
-    "jax==0.4.38",
-    "jaxlib==0.4.38",
+    "jax==0.4.35",


Is this intentional to downgrade the jax version?

I am guessing they are relying on a fixed version on transformer_engine, which has to be compatible with Jax version.

This is because 0.4.35 is the closest available ROCm JAX release (https://github.com/ROCm/jax/releases). The next version available is 0.5 which we are open to upgrading to whenever axlearn decides to use it.

ruomingp · 2025-04-16T20:16:30Z

axlearn/common/flash_attention/gpu_attention.py

@@ -985,3 +989,95 @@ class CuDNNGPUFlashAttentionWithExplicitBias(CuDNNGPUFlashAttention):
    """

    _allow_explicit_bias = True
+
+
+class ROCmTransformerEngineFlashAttention(BaseFlashAttention):


Add a unittest for this layer?

ruomingp · 2025-04-16T20:16:58Z

launch_70B_single_node.sh

+  --module=text.gpt.c4_trainer --config=fuji-70B-v2-flash-single-host \
+  --trainer_dir=/tmp/gpt_c4_test --data_dir=gs://axlearn-public/tensorflow_datasets \
+  --jax_backend=gpu \
+  --mesh_selector="amd-mi300-single-node" \


How can we reproduce your experiments?

kelvin-zou

Thanks

kelvin-zou · 2025-04-24T16:30:19Z

axlearn/common/flash_attention/gpu_attention.py

@@ -43,6 +43,10 @@
 )
 from jax.ad_checkpoint import checkpoint_name
 from jax.experimental import pallas as pl
+try:
+    from transformer_engine.jax.flax.transformer import DotProductAttention


Is it possible to extract the lower level API from transformer engine? We waited to integrate cudnn from Jax native exactly to avoid dependency on transformer_engine, which make Jax upgrade and version control much more difficult.

Fwiw, we are also open to plumb through xla customized call

kelvin-zou · 2025-04-24T16:31:31Z

axlearn/common/flash_attention/gpu_attention_rocm_test.py

@@ -0,0 +1,208 @@
+# Copyright © 2023 Apple Inc.


Suggested change

# Copyright © 2023 Apple Inc.

# Copyright © 2025 Apple Inc.

And feel free to make it AMD if you intend to maintain it.

kelvin-zou · 2025-04-24T16:33:54Z

axlearn/common/flash_attention/utils.py

@@ -24,6 +25,7 @@
    gpu=[
        GPUDecoding,
        # For GPU, prefer cuDNN (without bias) whenever possible, as it's the fastest.
+        ROCmTransformerEngineFlashAttention,


how is ROCmTransformerEngineFlashAttention code path selected here?

added a check in is_supported which will throw unsupported if not on ROCm backend. Let me know if this approach is fine

jiagaoxiang · 2025-04-24T16:34:19Z

axlearn/common/flash_attention/gpu_attention_rocm_test.py

+
+Currently tested on MI300. To run tests in parallel on a multi-GPU machine, use this:
+```
+PARALLEL_GPU_TEST=1 pytest -n 8 axlearn/common/flash_attention/gpu_attention_test.py


This file path needs to be revised to the current file.

akasharidas added 4 commits April 14, 2025 18:29

enable attention via Transformer Engine for ROCm support

0e4ffe5

add 70B config and launch script for MI300

14f0606

set JAX version to 0.4.35

44c1ab9

add basic docstring to ROCmTransformerEngineFlashAttention

5377fea

akasharidas marked this pull request as ready for review April 16, 2025 19:22

akasharidas requested review from ruomingp, markblee and a team as code owners April 16, 2025 19:22

ruomingp requested a review from kelvin-zou April 16, 2025 19:45

ruomingp reviewed Apr 16, 2025

View reviewed changes

add unit tests for ROCm TE layer

15a4f47

kelvin-zou reviewed Apr 24, 2025

View reviewed changes

jiagaoxiang reviewed Apr 24, 2025

View reviewed changes

akasharidas added 2 commits April 24, 2025 18:27

add check for ROCm backend

ee4820c

support sliding window attention in ROCm TE path

9b37416

blakechi force-pushed the rocm_dev branch from 6363b4b to 9b37416 Compare June 2, 2025 20:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ROCm support #1112

Add ROCm support #1112

Uh oh!

akasharidas commented Apr 16, 2025

Uh oh!

ruomingp left a comment

Uh oh!

ruomingp Apr 16, 2025

Uh oh!

kelvin-zou Apr 24, 2025

Uh oh!

akasharidas Apr 24, 2025

Uh oh!

ruomingp Apr 16, 2025

Uh oh!

ruomingp Apr 16, 2025

Uh oh!

kelvin-zou left a comment

Uh oh!

kelvin-zou Apr 24, 2025

Uh oh!

kelvin-zou Apr 24, 2025

Uh oh!

kelvin-zou Apr 24, 2025

Uh oh!

kelvin-zou Apr 24, 2025

Uh oh!

kelvin-zou Apr 24, 2025

Uh oh!

akasharidas Apr 24, 2025

Uh oh!

jiagaoxiang Apr 24, 2025

Uh oh!

Uh oh!

	# Copyright © 2023 Apple Inc.
	# Copyright © 2025 Apple Inc.

Add ROCm support #1112

Are you sure you want to change the base?

Add ROCm support #1112

Uh oh!

Conversation

akasharidas commented Apr 16, 2025

Uh oh!

ruomingp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kelvin-zou left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!