Add FA4 fp8 backend to low precision attention api by howardzhang-cv · Pull Request #3947 · pytorch/ao

howardzhang-cv · 2026-02-25T04:29:34Z

Stack from ghstack (oldest at bottom):

Summary

Added RoPE fusion compile path for FA4 FP8 low-precision attention (fuse_rope=True), mirroring the FA3 RoPE fusion design
New elementary block: fp8_fa4_rope_sdpa — fused RoPE + FP8 quantization + low-precision SDPA using the FA4 backend
FA4-specific custom op registration and compile_with_fp8_fusion entry point, reusing the shared FX graph fusion infrastructure (fusion_utils.py, custom_ops.py)
Reuses shared Triton quantization kernels and RoPE fusion pass — no new kernels needed, only FA4-specific wiring
FA4 supports both Hopper (SM 9.x) and Blackwell (SM 10.x) hardware
Added RoPE SDPA numerical accuracy tests and fuse_rope parametrization for the FA4 backend

New Files

fp8_fa4/fusion_pass.py: FA4-specific custom op registration, rope_sdpa_fusion_pass, and compile_with_fp8_fusion entry point

Modified Files

fp8_fa4/attention.py: Added fp8_fa4_rope_sdpa elementary block
fp8_fa4/init.py: Added fp8_fa4_rope_sdpa export
fp8_fa4/setup.py: Replaced compile placeholder with real compile_with_fp8_fusion
test_fp8_attention.py: Wired up rope_sdpa_fn=fp8_fa4_rope_sdpa in FA4 backend config

Test Plan

python -m pytest test/prototype/attention/test_fp8_attention.py -v

Example Usage

  from torchao.prototype.attention import (
      AttentionBackend,
      LowPrecisionAttentionConfig,
      apply_low_precision_attention,
  )

  model = MyModel()

  # Compile path with RoPE fusion using FA4
  config = LowPrecisionAttentionConfig(
      backend=AttentionBackend.FP8_FA4,
      fuse_rope=True,
  )
  model = apply_low_precision_attention(model, config)

  # Flash activation is handled internally by the wrapper
  output = model(inputs)

Results

Single-Layer Results

Results directly comparing FA4 SDPA versus FA4 fp8 SDPA (including quantization time):

Llama3 Model Results

Results comparing Llama3 model with FA4 SDPA versus Llama3 using the FA4 fp8 wrapper. Uses RoPE fusion.
Perplexity: 6.19 -> 6.24

[ghstack-poisoned]

pytorch-bot · 2026-02-25T04:29:38Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3947

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Unrelated Failure

As of commit 97eb634 with merge base 5ebd10d ():

NEW FAILURES - The following jobs have failed:

PR Label Check / Check PR Labels (gh)
Process completed with exit code 1.
Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/wh... / linux-job (gh)
RuntimeError: Command docker exec -t bc268fed67f4d901941336721edfa97449f8e5010cd2253bad12dec5fe7b879b /exec failed with exit code 1

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh) (trunk failure)
test/quantization/pt2e/test_x86inductor_fusion.py::TestDynamicPatternMatcher::test_q_attention_block

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 77f98b3 Pull-Request: pytorch#3947

[ghstack-poisoned]

ghstack-source-id: 2f1a14b Pull-Request: pytorch#3947

[ghstack-poisoned]

Adds the compile path (fuse_rope=True) for the FA4 backend, mirroring the FA3 fusion pass structure via the shared custom op and fusion pass factories. Key additions: - fp8_fa4/fusion_pass.py: FA4-specific custom ops and compile helper - fp8_fa4_rope_sdpa entry point in attention.py - Replace placeholder compile_fn with real fusion pass in setup.py - Wire up FA4 rope_sdpa_fn in test backend config ghstack-source-id: e19aa27 Pull-Request: pytorch#3947

[ghstack-poisoned]

Adds the compile path (fuse_rope=True) for the FA4 backend, mirroring the FA3 fusion pass structure via the shared custom op and fusion pass factories. Key additions: - fp8_fa4/fusion_pass.py: FA4-specific custom ops and compile helper - fp8_fa4_rope_sdpa entry point in attention.py - Replace placeholder compile_fn with real fusion pass in setup.py - Wire up FA4 rope_sdpa_fn in test backend config ghstack-source-id: ae76df9 Pull-Request: #3947

[ghstack-poisoned]

Adds the compile path (fuse_rope=True) for the FA4 backend, mirroring the FA3 fusion pass structure via the shared custom op and fusion pass factories. Key additions: - fp8_fa4/fusion_pass.py: FA4-specific custom ops and compile helper - fp8_fa4_rope_sdpa entry point in attention.py - Replace placeholder compile_fn with real fusion pass in setup.py - Wire up FA4 rope_sdpa_fn in test backend config ghstack-source-id: c9df159 Pull-Request: #3947

[ghstack-poisoned]

Adds the compile path (fuse_rope=True) for the FA4 backend, mirroring the FA3 fusion pass structure via the shared custom op and fusion pass factories. Key additions: - fp8_fa4/fusion_pass.py: FA4-specific custom ops and compile helper - fp8_fa4_rope_sdpa entry point in attention.py - Replace placeholder compile_fn with real fusion pass in setup.py - Wire up FA4 rope_sdpa_fn in test backend config ghstack-source-id: d2d46d6 Pull-Request: #3947

Adds the compile path (fuse_rope=True) for the FA4 backend, mirroring the FA3 fusion pass structure via the shared custom op and fusion pass factories. Key additions: - fp8_fa4/fusion_pass.py: FA4-specific custom ops and compile helper - fp8_fa4_rope_sdpa entry point in attention.py - Replace placeholder compile_fn with real fusion pass in setup.py - Wire up FA4 rope_sdpa_fn in test backend config ghstack-source-id: d2d46d6 Pull-Request: pytorch#3947

[ghstack-poisoned]

Adds the compile path (fuse_rope=True) for the FA4 backend, mirroring the FA3 fusion pass structure via the shared custom op and fusion pass factories. Key additions: - fp8_fa4/fusion_pass.py: FA4-specific custom ops and compile helper - fp8_fa4_rope_sdpa entry point in attention.py - Replace placeholder compile_fn with real fusion pass in setup.py - Wire up FA4 rope_sdpa_fn in test backend config ghstack-source-id: 3ac9da1 Pull-Request: #3947

[ghstack-poisoned]

Adds the compile path (fuse_rope=True) for the FA4 backend, mirroring the FA3 fusion pass structure via the shared custom op and fusion pass factories. Key additions: - fp8_fa4/fusion_pass.py: FA4-specific custom ops and compile helper - fp8_fa4_rope_sdpa entry point in attention.py - Replace placeholder compile_fn with real fusion pass in setup.py - Wire up FA4 rope_sdpa_fn in test backend config ghstack-source-id: 2e69d43 Pull-Request: #3947

Adds the compile path (fuse_rope=True) for the FA4 backend, mirroring the FA3 fusion pass structure via the shared custom op and fusion pass factories. Key additions: - fp8_fa4/fusion_pass.py: FA4-specific custom ops and compile helper - fp8_fa4_rope_sdpa entry point in attention.py - Replace placeholder compile_fn with real fusion pass in setup.py - Wire up FA4 rope_sdpa_fn in test backend config ghstack-source-id: 2e69d43 Pull-Request: pytorch#3947

[ghstack-poisoned]

Adds the compile path (fuse_rope=True) for the FA4 backend, mirroring the FA3 fusion pass structure via the shared custom op and fusion pass factories. Key additions: - fp8_fa4/fusion_pass.py: FA4-specific custom ops and compile helper - fp8_fa4_rope_sdpa entry point in attention.py - Replace placeholder compile_fn with real fusion pass in setup.py - Wire up FA4 rope_sdpa_fn in test backend config ghstack-source-id: 843d1d7 Pull-Request: #3947

Adds the compile path (fuse_rope=True) for the FA4 backend, mirroring the FA3 fusion pass structure via the shared custom op and fusion pass factories. Key additions: - fp8_fa4/fusion_pass.py: FA4-specific custom ops and compile helper - fp8_fa4_rope_sdpa entry point in attention.py - Replace placeholder compile_fn with real fusion pass in setup.py - Wire up FA4 rope_sdpa_fn in test backend config ghstack-source-id: 2e69d43 Pull-Request: pytorch#3947

[ghstack-poisoned]

Adds the compile path (fuse_rope=True) for the FA4 backend, mirroring the FA3 fusion pass structure via the shared custom op and fusion pass factories. Key additions: - fp8_fa4/fusion_pass.py: FA4-specific custom ops and compile helper - fp8_fa4_rope_sdpa entry point in attention.py - Replace placeholder compile_fn with real fusion pass in setup.py - Wire up FA4 rope_sdpa_fn in test backend config ghstack-source-id: db41738 Pull-Request: #3947

[ghstack-poisoned]

ghstack-source-id: 7473efd Pull-Request: #3947

[ghstack-poisoned]

ghstack-source-id: 211fa2c Pull-Request: #3947

[ghstack-poisoned]

ghstack-source-id: 70a4549 Pull-Request: #3947

[ghstack-poisoned]

ghstack-source-id: ce52a7e Pull-Request: #3947

[ghstack-poisoned]

ghstack-source-id: 289d483 Pull-Request: #3947

[ghstack-poisoned]

ghstack-source-id: 56eda45 Pull-Request: #3947

[ghstack-poisoned]

ghstack-source-id: 1de3b68 Pull-Request: #3947

[ghstack-poisoned]

ghstack-source-id: fffb33c Pull-Request: #3947

Update

0e9e521

[ghstack-poisoned]

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 25, 2026

howardzhang-cv marked this pull request as draft February 25, 2026 04:30

howardzhang-cv added the topic: new feature Use this tag if this PR adds a new feature label Feb 25, 2026

howardzhang-cv added a commit to howardzhang-cv/ao that referenced this pull request Feb 25, 2026

Add FA4 fp8 backend to low precision attention api

57de82f

ghstack-source-id: 77f98b3 Pull-Request: pytorch#3947

howardzhang-cv added a commit to howardzhang-cv/ao that referenced this pull request Feb 25, 2026

Add FA4 fp8 backend to low precision attention api

ad9f6f5

ghstack-source-id: 77f98b3 Pull-Request: pytorch#3947

Update

6deb8c7

[ghstack-poisoned]

howardzhang-cv added a commit to howardzhang-cv/ao that referenced this pull request Feb 25, 2026

Add FA4 fp8 backend to low precision attention api

b0ac345

ghstack-source-id: 2f1a14b Pull-Request: pytorch#3947

howardzhang-cv added a commit to howardzhang-cv/ao that referenced this pull request Feb 26, 2026

Add FA4 fp8 backend to low precision attention api

2952fe8

ghstack-source-id: 2f1a14b Pull-Request: pytorch#3947

Update

ba36ef3

[ghstack-poisoned]

This was referenced Feb 27, 2026

Add FP8 FA3 low-precision attention with monkey-patch SDPA path #3959

Merged

Add FA4 monkey-patch path for low-precision attention #3960

Draft

Update

31b374a

[ghstack-poisoned]

Update

31e6982

[ghstack-poisoned]

Update

4a72589

[ghstack-poisoned]

Update

4ba1ed3

[ghstack-poisoned]

Update

5935921

[ghstack-poisoned]

howardzhang-cv mentioned this pull request Mar 3, 2026

Add FA4 fp8 implementation to SDPA pytorch/pytorch#175472

Draft

Update

4c39013

[ghstack-poisoned]

Update

7cc5099

[ghstack-poisoned]

Update

66f92bc

[ghstack-poisoned]

howardzhang-cv added a commit that referenced this pull request Mar 6, 2026

Add FA4 RoPE fusion path for low-precision attention

6d9940d

ghstack-source-id: 7473efd Pull-Request: #3947

Update

f516f20

[ghstack-poisoned]

howardzhang-cv added a commit that referenced this pull request Mar 6, 2026

Add FA4 RoPE fusion path for low-precision attention

81e1477

ghstack-source-id: 211fa2c Pull-Request: #3947

Update

6c17b24

[ghstack-poisoned]

howardzhang-cv added a commit that referenced this pull request Mar 6, 2026

Add FA4 RoPE fusion path for low-precision attention

fde75bc

ghstack-source-id: 70a4549 Pull-Request: #3947

Update

00b717e

[ghstack-poisoned]

howardzhang-cv added a commit that referenced this pull request Mar 7, 2026

Add FA4 RoPE fusion path for low-precision attention

e05dc78

ghstack-source-id: ce52a7e Pull-Request: #3947

Update

0fda4a6

[ghstack-poisoned]

howardzhang-cv added a commit that referenced this pull request Mar 7, 2026

Add FA4 RoPE fusion path for low-precision attention

56931e4

ghstack-source-id: 289d483 Pull-Request: #3947

howardzhang-cv mentioned this pull request Mar 10, 2026

[Cute,Fwd,Sm100] fp8 e4m3 and e5m2 support Dao-AILab/flash-attention#2109

Open

Update

b8e3108

[ghstack-poisoned]

howardzhang-cv added a commit that referenced this pull request Mar 11, 2026

Add FA4 RoPE fusion path for low-precision attention

484148c

ghstack-source-id: 56eda45 Pull-Request: #3947

This was referenced Mar 11, 2026

remove rope fusion option, do automatically on torch compile #4055

Open

Added prototype low precision attention API to the docs #4056

Open

Update

88744f3

[ghstack-poisoned]

howardzhang-cv added a commit that referenced this pull request Mar 11, 2026

Add FA4 RoPE fusion path for low-precision attention

92de888

ghstack-source-id: 1de3b68 Pull-Request: #3947

howardzhang-cv mentioned this pull request Mar 11, 2026

soften version guard check for low precision attention API #4058

Open

Update

97eb634

[ghstack-poisoned]

howardzhang-cv added a commit that referenced this pull request Mar 11, 2026

Add FA4 RoPE fusion path for low-precision attention

4917b57

ghstack-source-id: fffb33c Pull-Request: #3947

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FA4 fp8 backend to low precision attention api#3947

Add FA4 fp8 backend to low precision attention api#3947
howardzhang-cv wants to merge 18 commits intogh/howardzhang-cv/22/basefrom
gh/howardzhang-cv/22/head

howardzhang-cv commented Feb 25, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

howardzhang-cv commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New Files

Modified Files

Test Plan

Example Usage

Results

Single-Layer Results

Llama3 Model Results

Uh oh!

pytorch-bot bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3947

❌ 2 New Failures, 1 Unrelated Failure

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

howardzhang-cv commented Feb 25, 2026 •

edited

Loading

pytorch-bot bot commented Feb 25, 2026 •

edited

Loading