Replies: 3 comments
-
|
My first tests for inference show that at least on Ada, it's about 25% slower than Sage while being slightly more accurate and using a tad less memory. I haven't tested it for training yet. |
Beta Was this translation helpful? Give feedback.
-
|
That being said, almost all the places I see FA3 used are for inference, and I haven't seen it used for training yet. |
Beta Was this translation helpful? Give feedback.
-
|
I finally got around to testing flash3 for training Wan. It seems to be a tad faster than flash2 (trained in 10.5 hours vs 11.5) so not a huge boon but a small boost at least for me on Ada. For inference Sage is definitely preferable (though I haven't tested flash3 fp8 because Ada, but I imagine most people will stick with Sage) but I will likely continue to use flash3 for training assuming results aren't noticeably different. The compile is BRUTAL, even limited to 6 threads on my system it exceeds 80GB RAM at points and takes ~2 hours. Highly recommend installing ninja in your venv and setting MAX_JOBS to something reasonable, by default it will use the number of threads on your system(16 on my 6900k, which needs hundreds of GBs of RAM to complete..) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Anybody have any experience with it? They recently added support for SM8+ apparently, but it's still in beta and requires installation from source and compiling it is pretty brutal it was trying to use like 200GB RAM until I limited it to 4 threads(it spawned 16 by default because 6900k) and it's gonna take a few hours to fully compile. But I'm very interested because unlike SageAttention, FlashAttention3 supports the backwards pass meaning it's useful during training too. I do see args in the training scripts for using it so I wondered what people's experiences were so far with either Hunyuan or Wan for either training or inference. Any tips or pitfalls or thoughts in general!
Beta Was this translation helpful? Give feedback.
All reactions