Skip to content

[Flex Attention] Add a Triton version so we can test autoWS#983

Draft
manman-ren wants to merge 7 commits intomainfrom
mren/flex-attn-triton
Draft

[Flex Attention] Add a Triton version so we can test autoWS#983
manman-ren wants to merge 7 commits intomainfrom
mren/flex-attn-triton

Conversation

@manman-ren
Copy link
Copy Markdown
Contributor

No description provided.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:
    if is_full:
        forward_block_mn(..., IS_FULL_BLOCKS=True)   # compile-time specialized
    else:
        forward_block_mn(..., IS_FULL_BLOCKS=False)  # compile-time specialized

    This can cause an issue with autoWS, we can use a variable.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@manman-ren manman-ren temporarily deployed to docker-s3-upload March 30, 2026 21:54 — with GitHub Actions Inactive
@manman-ren manman-ren temporarily deployed to docker-s3-upload March 30, 2026 21:54 — with GitHub Actions Inactive
@manman-ren manman-ren temporarily deployed to docker-s3-upload March 30, 2026 21:54 — with GitHub Actions Inactive
@meta-cla meta-cla Bot added the cla signed label Mar 30, 2026
@manman-ren manman-ren marked this pull request as draft March 30, 2026 21:54
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Branchless masking to avoid scf.if with else blocks (unsupported by
WSSpecialize), add merge_epilogue annotation, use tl.where instead of
Python ternary for runtime conditionals, and add BLOCK_M=128 WS config.

Authored with Claude.

dp_factor of 1 works
CUDA_VISIBLE_DEVICES=5 TORCHINDUCTOR_COMPILE_THREADS=1 TRITON_USE_META_PARTITION=1 TRITON_USE_META_WS=1 python run.py --op flex_attention --mode fwd --only triton_ws,compiled --seq-len 1024 --mod-type causal --force --metrics accuracy --baseline compiled
             (B, Hq, M, Hkv, N, D) | Mask Type    triton_ws-accuracy
----------------------------------------------  --------------------
(8, 16, 1024, 16, 1024, 128) |          causal                     1
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant