enable flash attn by default #1186

leejet · 2026-01-11T09:35:46Z

No description provided.

JohnLoveJoy · 2026-01-11T15:37:27Z

This will continue to degrade the performance of other non-Nvidia platforms, so why enable it?

Green-Sky · 2026-01-11T15:55:00Z

This will continue to degrade the performance of other non-Nvidia platforms, so why enable it?

Do you have some numbers for your platform(s) ?

JohnLoveJoy · 2026-01-11T16:28:47Z

You save a tiny bit of memory. But it's more than 50% slower (22s > 33s, Z Image Turbo) on any AMD hardware I've tested, and that's a well-known fact.

wbruna · 2026-01-11T20:18:35Z

You save a tiny bit of memory. But it's more than 50% slower (22s > 33s, Z Image Turbo) on any AMD hardware I've tested, and that's a well-known fact.

It depends on the card. My own 7600 XT gets slower on Vulkan (9.08s/it with FA and 7.88s/it without, for a 1024x1024 8-step gen), but much faster on ROCm (5.65s/it versus 10.32s/it). The memory savings may justify leaving it on by default, though (-1.7G on ROCm, -3.9G on Vulkan).

@leejet , perhaps we could standardize the boolean flags, to make general usage easier, and avoid changing them when a default changes? We could e.g. consistently adopt a 'no-' prefix to turn the flags off, and always accept both forms (so in this case --diffusion-fa would become a no-op, and --no-diffusion-fa would turn it off; and current scripts and UIs would keep working as before).

daniandtheweb · 2026-01-11T21:12:02Z

About the performance differences, maybe it could be a good idea to create a discussion to keep track of the performance in all the different hardware configurations (just like in llama.cpp).
This way it would be possible to check the data before making changes to the program.

Green-Sky · 2026-01-11T21:17:51Z

About the performance differences, maybe it could be a good idea to create a discussion to keep track of the performance in all the different hardware configurations (just like in llama.cpp). This way it would be possible to check the data before making changes to the program.

This would be great. It would be nice if we could get a set of people with varying hardware and software configurations to come back and add the numbers for the same model/settings to their table.

It would also be nice to have a more dedicated benchmark tool/mode. Maybe with dummy models or something.

enable flash attn by default

8283e1b

loci-dev mentioned this pull request Jan 11, 2026

UPSTREAM PR #1186: enable flash attn by default auroralabs-loci/stable-diffusion.cpp#15

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

enable flash attn by default #1186

enable flash attn by default #1186

Uh oh!

leejet commented Jan 11, 2026

Uh oh!

JohnLoveJoy commented Jan 11, 2026

Uh oh!

Green-Sky commented Jan 11, 2026

Uh oh!

JohnLoveJoy commented Jan 11, 2026

Uh oh!

wbruna commented Jan 11, 2026

Uh oh!

daniandtheweb commented Jan 11, 2026

Uh oh!

Green-Sky commented Jan 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

enable flash attn by default #1186

Are you sure you want to change the base?

enable flash attn by default #1186

Uh oh!

Conversation

leejet commented Jan 11, 2026

Uh oh!

JohnLoveJoy commented Jan 11, 2026

Uh oh!

Green-Sky commented Jan 11, 2026

Uh oh!

JohnLoveJoy commented Jan 11, 2026

Uh oh!

wbruna commented Jan 11, 2026

Uh oh!

daniandtheweb commented Jan 11, 2026

Uh oh!

Green-Sky commented Jan 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants