Added benchmark for single attention layer across different sequence lengths by howardzhang-cv · Pull Request #3929 · pytorch/ao

howardzhang-cv · 2026-02-21T02:47:56Z

Stack from ghstack (oldest at bottom):

Summary

Added new benchmark for new low precision attention API: tests a single attention layer (fp8 attention layers include the quantization kernel as part of the test)
Can set baseline and test models between different backends: (fa2, fa3, fa3_fp8, fa4, fa4_fp8)

Example Run

python benchmarks/prototype/attention/benchmark_sdpa.py --baseline fa3 --test fa3_fp8

[ghstack-poisoned]

pytorch-bot · 2026-02-21T02:48:00Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3929

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit abf45cf with merge base 42bcdc4 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…lengths Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: f75bc1d Pull-Request: pytorch#3929

[ghstack-poisoned]

…lengths Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: dd85756 Pull-Request: pytorch#3929

[ghstack-poisoned]

…lengths Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 9f9d973 Pull-Request: pytorch#3929

[ghstack-poisoned]

…lengths Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 3d0b5ef Pull-Request: pytorch#3929

[ghstack-poisoned]

Benchmark script for measuring FP8 SDPA performance on a single attention layer across different sequence lengths, head dimensions, and backends. Useful for isolating kernel-level performance. ghstack-source-id: 43da6b1 Pull-Request: pytorch#3929

[ghstack-poisoned]

Benchmark script for measuring FP8 SDPA performance on a single attention layer across different sequence lengths, head dimensions, and backends. Useful for isolating kernel-level performance. ghstack-source-id: 8591090 Pull-Request: pytorch#3929

[ghstack-poisoned]

Benchmark script for measuring FP8 SDPA performance on a single attention layer across different sequence lengths, head dimensions, and backends. Useful for isolating kernel-level performance. ghstack-source-id: ae37727 Pull-Request: pytorch#3929

[ghstack-poisoned]

Benchmark script for measuring FP8 SDPA performance on a single attention layer across different sequence lengths, head dimensions, and backends. Useful for isolating kernel-level performance. ghstack-source-id: ae37727 Pull-Request: pytorch#3929

[ghstack-poisoned]

howardzhang-cv added 2 commits February 20, 2026 18:47

Update (base update)

611b663

[ghstack-poisoned]

Update

e4fbdeb

[ghstack-poisoned]

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 21, 2026

This was referenced Feb 21, 2026

Added new API for low precision fp8 attention using FA3 #3857

Merged

Added benchmarking for new torchao low precision attention api #3865

Merged

Added benchmark for LLaMA 3 model for attention tests #3930

Merged

howardzhang-cv marked this pull request as draft February 21, 2026 02:49

howardzhang-cv added 2 commits February 24, 2026 15:25

Update (base update)

93be362

[ghstack-poisoned]

Update

8b1a829

[ghstack-poisoned]

howardzhang-cv mentioned this pull request Feb 24, 2026

Add FA4 fp8 backend to low precision attention api #3944

Closed

howardzhang-cv added 2 commits February 24, 2026 20:29

Update (base update)

07f076b

[ghstack-poisoned]

Update

1f9f03a

[ghstack-poisoned]

howardzhang-cv mentioned this pull request Feb 25, 2026

Add FA4 fp8 backend to low precision attention api #3947

Draft

howardzhang-cv added module: not user facing Use this tag if you don't want this PR to show up in release notes benchmark labels Feb 25, 2026

howardzhang-cv added 2 commits February 25, 2026 13:16

Update (base update)

686dfc9

[ghstack-poisoned]

Update

9e6bd23

[ghstack-poisoned]

howardzhang-cv added 2 commits February 27, 2026 00:06

Update (base update)

79d3da0

[ghstack-poisoned]

Update

2c51942

[ghstack-poisoned]

This was referenced Feb 27, 2026

Add FP8 FA3 low-precision attention with monkey-patch SDPA path #3959

Merged

Add FA4 monkey-patch path for low-precision attention #3960

Draft

howardzhang-cv requested review from drisspg and vkuzo March 2, 2026 19:28

howardzhang-cv added 2 commits March 2, 2026 14:45

Update (base update)

c64f35d

[ghstack-poisoned]

Update

ff974ca

[ghstack-poisoned]

howardzhang-cv added 2 commits March 2, 2026 16:28

Update (base update)

0452e1d

[ghstack-poisoned]

Update

8746457

[ghstack-poisoned]

howardzhang-cv added 2 commits March 2, 2026 17:11

Update (base update)

8fb2ab1

[ghstack-poisoned]

Update

11933c4

[ghstack-poisoned]

howardzhang-cv added 8 commits March 5, 2026 12:58

Update (base update)

6390820

[ghstack-poisoned]

Update

83107e2

[ghstack-poisoned]

Update (base update)

25ec00b

[ghstack-poisoned]

Update

453c734

[ghstack-poisoned]

Update (base update)

207a6fe

[ghstack-poisoned]

Update

639c58f

[ghstack-poisoned]

Update (base update)

79fd841

[ghstack-poisoned]

Update

4c00624

[ghstack-poisoned]

drisspg approved these changes Mar 7, 2026

View reviewed changes

howardzhang-cv added 4 commits March 6, 2026 18:03

Update (base update)

b0ee9d2

[ghstack-poisoned]

Update

0bb48c1

[ghstack-poisoned]

Update (base update)

9089185

[ghstack-poisoned]

Update

abf45cf

[ghstack-poisoned]

howardzhang-cv changed the base branch from gh/howardzhang-cv/19/base to main March 9, 2026 22:18

howardzhang-cv merged commit 1b920d0 into main Mar 9, 2026
36 checks passed

howardzhang-cv deleted the gh/howardzhang-cv/19/head branch March 9, 2026 22:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added benchmark for single attention layer across different sequence lengths#3929

Added benchmark for single attention layer across different sequence lengths#3929
howardzhang-cv merged 34 commits intomainfrom
gh/howardzhang-cv/19/head

howardzhang-cv commented Feb 21, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

howardzhang-cv commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Example Run

Uh oh!

pytorch-bot bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3929

✅ No Failures

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

howardzhang-cv commented Feb 21, 2026 •

edited

Loading

pytorch-bot bot commented Feb 21, 2026 •

edited

Loading