Flash Attention v2 achieved 44% faster than xformers/pytorch_sdp_attention on large image #11884
Replies: 6 comments 13 replies
-
Flash Attention 2 has landed in xformers but my initial tests aren't showing any significant speed difference between xformers 0.0.20 and 0.0.21-dev571. |
Beta Was this translation helpful? Give feedback.
-
I made a draft PR for testing: #11902 |
Beta Was this translation helpful? Give feedback.
-
i think xformers is also making updates latest dev version is not working on free google colab gpu |
Beta Was this translation helpful? Give feedback.
-
I compiled and installed the xformers (mainline) and run a benchmark on the same settings. The performances is close to
|
Beta Was this translation helpful? Give feedback.
-
nice what we install via pip (dev version) still don't have this version right? because i didnt see dramatic increase |
Beta Was this translation helpful? Give feedback.
-
why did I have a opposite result 🤣 speed go down about 10% between xformers 0.0.20 and 0.0.21 dev 2080 Ti at Win11 |
Beta Was this translation helpful? Give feedback.
-
FlashAttention-2 (repo: https://github.com/Dao-AILab/flash-attention, package:
flash-attn
, paper) released this weak, which provide "Better Parallelism and Work Partitioning" than existing works, leading to a much higher performance (up to 2x said in their paper) than old FlashAttention (similar to xformers/pytorch_sdp_attention).I wrote a small patch for sd webui and did a quick benckmark. I use "Hires. fix" in the tests, which is popular and you could both see the performance on small image and large image:
It has a massive improvement on large image. And it's 44% faster total time than pytorch_sdp_attention (38s vs 55s), while xformers has a similar performance with pytorch_sdp_attention on my device. The output image may has very little difference from baseline, just like xformers or pytorch_sdp_attention.
I think it's worth to been added to stable diffusion webui, and I will soon make a draft PR about this. However, their are still some problems to discuss:
flash-attn
version 2.0 package will breaktransformers
, which is a dependency of sd webui and prevent it to run. This is due toflash-attn
2.0 renamed some functions. We may need to waiting fortransformers
to fix it. (A workaround is set a alisa in flash-attn package files)flash-attn
package, I think make a PR to supportflash-attn
may still been useful.flash-attn
only support fp16 and bfloat16, so it's need fallback to other attention implementations (like sdp_a) when a attention is performed in other precisions. This won't affect performance a lot.flash-attn
not support windows yet, andflash-attn
will takes a lot of memory when installing (it will compile something), so we may left it for advanced users to install it by them self.Beta Was this translation helpful? Give feedback.
All reactions