Enable packed sequences and ring attention for CUDNN Flash Attention #2600

kocchop · 2025-11-05T02:08:13Z

Description

Added THD packed format support and configurable context parallel strategies (all_gather/ring) for TransformerEngine's DotProductAttention. Included comprehensive GPU tests for packed attention (sm90+) and ring attention modes.

Added max_segments_per_seq config for packed sequence control
Supported context_parallel_strategy selection (all_gather vs ring)
Add GPU tests for both packed and ring attention scenarios

Note: Context parallelism with packing temporarily disabled pending full support soon.

Tests

Appropriate tests for packed sequence with fused attention and ring attention have been added.

Checklist

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

gobbleturk · 2025-11-05T17:48:18Z

src/MaxText/configs/base.yml

 generate_padding_batch_eval: False
+# Maximum number of segments that can be packed into a single sequence
+# This needs to be passed to TransformerEngine's DotProductAttention layer for packing
+max_segments_per_seq: 32


can you add a valueError in pyconfig if this is set on TPU it errors out - this is only supported on GPU?

I can make it a warning for TPUs. But it does not get used in the TPU path. So, not sure if erroring out would make sense

added a warning @gobbleturk please check

I realize it doesn't get used in the TPU path but I'm worried some of our TPU users might just read this variable name "max_segments_per_seq" and try it out on TPU. They will not be happy to see it silently doesn't do anything - I would prefer an error

Hi @gobbleturk I removed the warning and squashed in a single commit. Could you please check?

src/MaxText/configs/base.yml

Added THD packed format support and configurable context parallel strategies (all_gather/ring) for TransformerEngine's DotProductAttention. Included comprehensive GPU tests for packed attention (sm90+) and ring attention modes. - Added max_segments_per_seq config for packed sequence control - Supported context_parallel_strategy selection (all_gather vs ring) - Handled dummy attn mask separately for packed sequences - Added config validations and softened the synth data + packing check - Add GPU tests for both packed and ring attention scenarios Note: Context parallelism with packing temporarily disabled pending full support soon.

kocchop requested review from A9isha, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, gagika, gobbleturk, hengtaoguo, jiangjy1982, khatwanimohit, parambole, richjames0, shralex, shuningjin, suexu1025 and vipannalla as code owners November 5, 2025 02:08

kocchop requested a review from NicoGrande as a code owner November 5, 2025 02:45

gobbleturk reviewed Nov 5, 2025

View reviewed changes

src/MaxText/configs/base.yml Show resolved Hide resolved

gobbleturk approved these changes Nov 5, 2025

View reviewed changes

kocchop requested a review from gobbleturk November 6, 2025 02:04

kocchop force-pushed the faysal/ring-attn branch from 73af807 to 2454858 Compare November 6, 2025 22:20

gobbleturk added the pull ready label Nov 6, 2025

copybara-service bot merged commit 9d0c860 into AI-Hypercomputer:main Nov 7, 2025
35 of 36 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable packed sequences and ring attention for CUDNN Flash Attention #2600

Enable packed sequences and ring attention for CUDNN Flash Attention #2600

kocchop commented Nov 5, 2025

Uh oh!

gobbleturk Nov 5, 2025

Uh oh!

kocchop Nov 6, 2025

Uh oh!

kocchop Nov 6, 2025

Uh oh!

gobbleturk Nov 6, 2025

Uh oh!

kocchop Nov 6, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Enable packed sequences and ring attention for CUDNN Flash Attention #2600

Enable packed sequences and ring attention for CUDNN Flash Attention #2600

Conversation

kocchop commented Nov 5, 2025

Description

Tests

Checklist

Uh oh!

gobbleturk Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

kocchop Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

kocchop Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

gobbleturk Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

kocchop Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants