Skip to content

Conversation

@leejet
Copy link
Owner

@leejet leejet commented Jan 11, 2026

No description provided.

@JohnLoveJoy
Copy link

This will continue to degrade the performance of other non-Nvidia platforms, so why enable it?

@Green-Sky
Copy link
Contributor

This will continue to degrade the performance of other non-Nvidia platforms, so why enable it?

Do you have some numbers for your platform(s) ?

@JohnLoveJoy
Copy link

You save a tiny bit of memory. But it's more than 50% slower (22s > 33s, Z Image Turbo) on any AMD hardware I've tested, and that's a well-known fact.

@wbruna
Copy link
Contributor

wbruna commented Jan 11, 2026

You save a tiny bit of memory. But it's more than 50% slower (22s > 33s, Z Image Turbo) on any AMD hardware I've tested, and that's a well-known fact.

It depends on the card. My own 7600 XT gets slower on Vulkan (9.08s/it with FA and 7.88s/it without, for a 1024x1024 8-step gen), but much faster on ROCm (5.65s/it versus 10.32s/it). The memory savings may justify leaving it on by default, though (-1.7G on ROCm, -3.9G on Vulkan).

@leejet , perhaps we could standardize the boolean flags, to make general usage easier, and avoid changing them when a default changes? We could e.g. consistently adopt a 'no-' prefix to turn the flags off, and always accept both forms (so in this case --diffusion-fa would become a no-op, and --no-diffusion-fa would turn it off; and current scripts and UIs would keep working as before).

@daniandtheweb
Copy link
Contributor

About the performance differences, maybe it could be a good idea to create a discussion to keep track of the performance in all the different hardware configurations (just like in llama.cpp).
This way it would be possible to check the data before making changes to the program.

@Green-Sky
Copy link
Contributor

About the performance differences, maybe it could be a good idea to create a discussion to keep track of the performance in all the different hardware configurations (just like in llama.cpp). This way it would be possible to check the data before making changes to the program.

This would be great. It would be nice if we could get a set of people with varying hardware and software configurations to come back and add the numbers for the same model/settings to their table.

It would also be nice to have a more dedicated benchmark tool/mode. Maybe with dummy models or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants