Added benchmarking for new torchao low precision attention api by howardzhang-cv · Pull Request #3865 · pytorch/ao

howardzhang-cv · 2026-02-12T02:39:29Z

Stack from ghstack (oldest at bottom):

Summary

Added new benchmark for new low precision attention API
Can set baseline and test models between different backends: (fa2, fa3, fa3_fp8, fa4, fa4_fp8)
uses flux.1-schnell model, 4 inference steps, DrawBench prompts
has options to control number of prompts, torch.compile usage, warmup_iters, using debug prompts, number of inference steps, rope fusion
Following the guidelines of add performance and accuracy eval of flux-1.schnell #3502

Example Run

python benchmarks/prototype/attention/eval_flux_model.py --baseline fa3 --test fa3_fp8 --compile

[ghstack-poisoned]

pytorch-bot · 2026-02-12T02:39:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3865

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6065db8 with merge base 42bcdc4 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 66939a9 Pull-Request: #3865

[ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: dd15b60 Pull-Request: #3865

[ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 4f64a25 Pull-Request: #3865

[ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 2c5edc7 Pull-Request: pytorch#3865

[ghstack-poisoned]

Benchmark script for evaluating FP8 attention on Flux (text-to-image) models. Measures image generation quality and runtime performance with and without low-precision attention applied. ghstack-source-id: 0078782 Pull-Request: pytorch#3865

[ghstack-poisoned]

Benchmark script for evaluating FP8 attention on Flux (text-to-image) models. Measures image generation quality and runtime performance with and without low-precision attention applied. ghstack-source-id: f287aed Pull-Request: pytorch#3865

[ghstack-poisoned]

Benchmark script for evaluating FP8 attention on Flux (text-to-image) models. Measures image generation quality and runtime performance with and without low-precision attention applied. ghstack-source-id: f287aed Pull-Request: pytorch#3865

vkuzo · 2026-03-03T16:09:28Z

do you want to eventually compare this with results from quantizing linear, moe, etc? if yes, could just build into existing scripts from the start, if no then separate script seems fine

Benchmark script for evaluating FP8 attention on Flux (text-to-image) models. Measures image generation quality and runtime performance with and without low-precision attention applied. ghstack-source-id: f287aed Pull-Request: pytorch#3865

[ghstack-poisoned]

howardzhang-cv added 2 commits February 11, 2026 18:39

Update (base update)

d26cabd

[ghstack-poisoned]

Update

eb083c0

[ghstack-poisoned]

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 12, 2026

howardzhang-cv added a commit that referenced this pull request Feb 12, 2026

Added benchmarking for new torchao low precision attention api

3664ef6

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 66939a9 Pull-Request: #3865

howardzhang-cv mentioned this pull request Feb 12, 2026

Added new API for low precision fp8 attention using FA3 #3857

Merged

howardzhang-cv added benchmark module: not user facing Use this tag if you don't want this PR to show up in release notes labels Feb 12, 2026

howardzhang-cv added 2 commits February 11, 2026 19:29

Update (base update)

08eaf40

[ghstack-poisoned]

Update

2380610

[ghstack-poisoned]

howardzhang-cv added a commit that referenced this pull request Feb 12, 2026

Added benchmarking for new torchao low precision attention api

29f2406

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: dd15b60 Pull-Request: #3865

howardzhang-cv added 2 commits February 12, 2026 16:49

Update (base update)

d372af3

[ghstack-poisoned]

Update

994a587

[ghstack-poisoned]

howardzhang-cv mentioned this pull request Feb 13, 2026

use helion instead of triton for low precision attention quantization kernels #3880

Closed

howardzhang-cv added 6 commits February 12, 2026 17:06

Update (base update)

8de0af2

[ghstack-poisoned]

Update

b17b5ec

[ghstack-poisoned]

Update (base update)

ad96369

[ghstack-poisoned]

Update

9589d68

[ghstack-poisoned]

Update (base update)

33493e6

[ghstack-poisoned]

Update

fe36423

[ghstack-poisoned]

howardzhang-cv added a commit that referenced this pull request Feb 13, 2026

Added benchmarking for new torchao low precision attention api

66a2ddd

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 4f64a25 Pull-Request: #3865

howardzhang-cv added 2 commits February 20, 2026 18:47

Update (base update)

3287be2

[ghstack-poisoned]

Update

d65530f

[ghstack-poisoned]

This was referenced Feb 21, 2026

Added benchmark for single attention layer across different sequence lengths #3929

Merged

Added benchmark for LLaMA 3 model for attention tests #3930

Merged

howardzhang-cv marked this pull request as draft February 21, 2026 02:49

howardzhang-cv added a commit to howardzhang-cv/ao that referenced this pull request Feb 23, 2026

Added benchmarking for new torchao low precision attention api

a9a39b2

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 2c5edc7 Pull-Request: pytorch#3865

howardzhang-cv added a commit to howardzhang-cv/ao that referenced this pull request Feb 24, 2026

Added benchmarking for new torchao low precision attention api

832d7ad

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 2c5edc7 Pull-Request: pytorch#3865

howardzhang-cv added 2 commits February 24, 2026 15:25

Update (base update)

b3e130c

[ghstack-poisoned]

Update

365224a

[ghstack-poisoned]

howardzhang-cv requested review from drisspg and vkuzo March 2, 2026 19:28

howardzhang-cv added 2 commits March 2, 2026 14:45

Update (base update)

c26c9ec

[ghstack-poisoned]

Update

2a0aaed

[ghstack-poisoned]

howardzhang-cv added 2 commits March 2, 2026 16:28

Update (base update)

044cc6e

[ghstack-poisoned]

Update

b696b53

[ghstack-poisoned]

howardzhang-cv added 2 commits March 2, 2026 17:11

Update (base update)

665e1fc

[ghstack-poisoned]

Update

1341000

[ghstack-poisoned]

howardzhang-cv added 6 commits March 5, 2026 12:58

Update (base update)

57ca528

[ghstack-poisoned]

Update

e1fd08c

[ghstack-poisoned]

Update (base update)

f7599cb

[ghstack-poisoned]

Update

ed18070

[ghstack-poisoned]

Update (base update)

2866109

[ghstack-poisoned]

Update

f72cce0

[ghstack-poisoned]

drisspg approved these changes Mar 6, 2026

View reviewed changes

howardzhang-cv added 6 commits March 6, 2026 14:44

Update (base update)

ac50eaf

[ghstack-poisoned]

Update

bb511ef

[ghstack-poisoned]

Update (base update)

a6fb839

[ghstack-poisoned]

Update

4fc8c25

[ghstack-poisoned]

Update (base update)

d740be0

[ghstack-poisoned]

Update

6065db8

[ghstack-poisoned]

howardzhang-cv changed the base branch from gh/howardzhang-cv/17/base to main March 9, 2026 22:03

howardzhang-cv merged commit 1b390fb into main Mar 9, 2026
36 checks passed

howardzhang-cv deleted the gh/howardzhang-cv/17/head branch March 9, 2026 22:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added benchmarking for new torchao low precision attention api#3865

Added benchmarking for new torchao low precision attention api#3865
howardzhang-cv merged 46 commits intomainfrom
gh/howardzhang-cv/17/head

howardzhang-cv commented Feb 12, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 12, 2026 •

edited

Loading

Uh oh!

vkuzo commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

howardzhang-cv commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Example Run

Uh oh!

pytorch-bot bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3865

✅ No Failures

Uh oh!

vkuzo commented Mar 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

howardzhang-cv commented Feb 12, 2026 •

edited

Loading

pytorch-bot bot commented Feb 12, 2026 •

edited

Loading