Sage Attention supports minimal modifications when compiling MS Visual Studio on Windows.#323
Sage Attention supports minimal modifications when compiling MS Visual Studio on Windows.#323mengqin wants to merge 8 commits intothu-ml:mainfrom
Conversation
multi-architecture builds.
Windows builds and add a build option for GPU-free platforms.
|
Hi, did you successfully run sageattn3 on Windows with Blackwell GPU? Currently a lot of people are seeing errors like Once this is fixed, I think it's straightforward to add Python ABI3 and libtorch stable ABI to sageattn3. |
Yes, I've realized that although my branch's source code can compile SageAttention3 with CUDA 12.8 + PyTorch 2.8, it still crashes. The problem is quite complex and triggers a series of issues:
After making these modifications locally, the performance of SageAttention3 seems to have decreased, but SageAttention 2.2.0 seems to be faster than before? I haven't figured out why yet. But their speeds are similar on my 5090; I'm not sure if this is correct. If I can't do better, I might upload my code first. 简单说,我本地改好了,但是改的有点多,跟我想的保持最小修改有点冲突,而且不确定这么改会不会降低太多效率。我可以先提交,大家可以先看看。 |
compilation parameter, and corrected the MSVC parameter passing method to be compatible with CUDA 13.0.
|
That's great news! I guess there should be a way to do 128-byte alignment in MSVC (and when MSVC is called in nvcc). What if you add something like |
The problem isn't data structure alignment, but rather kernel function parameter alignment. Some macros in CUTLASS use |
|
Thank you for your PR, i was able to build the following:
If anyone else wants it, here's the uploaded wheel if it suits your versions: |
I made minimal modifications to the Windows compilation of Sage Attention, only modifying setup.py and the setup.py of Sage attn 3, as well as a header file in attn3, which allowed it to compile successfully in a Windows + VS2022 environment.
I tested the compilation successfully on torch 2.6.0-2.9.1 and cuda 12.4-13.0.