Sage Attention supports minimal modifications when compiling MS Visual Studio on Windows. by mengqin · Pull Request #323 · thu-ml/SageAttention

mengqin · 2025-12-08T05:56:52Z

I made minimal modifications to the Windows compilation of Sage Attention, only modifying setup.py and the setup.py of Sage attn 3, as well as a header file in attn3, which allowed it to compile successfully in a Windows + VS2022 environment.

I tested the compilation successfully on torch 2.6.0-2.9.1 and cuda 12.4-13.0.

multi-architecture builds.

Windows builds and add a build option for GPU-free platforms.

woct0rdho · 2025-12-13T07:36:09Z

Hi, did you successfully run sageattn3 on Windows with Blackwell GPU? Currently a lot of people are seeing errors like CUDA error: misaligned address, see the discussion at woct0rdho#42 .

Once this is fixed, I think it's straightforward to add Python ABI3 and libtorch stable ABI to sageattn3.

mengqin · 2025-12-14T07:35:30Z

Hi, did you successfully run sageattn3 on Windows with Blackwell GPU? Currently a lot of people are seeing errors like CUDA error: misaligned address, see the discussion at woct0rdho#42 .

Once this is fixed, I think it's straightforward to add Python ABI3 and libtorch stable ABI to sageattn3.

Yes, I've realized that although my branch's source code can compile SageAttention3 with CUDA 12.8 + PyTorch 2.8, it still crashes.
I fixed this error locally last week and performed some preliminary tests.

The problem is quite complex and triggers a series of issues:

First, the missing /Zc:__cplus_plus flag caused some macros in the Cutlass library to not be enabled correctly. This is a classic problem; it means that although MSVS passed the C++17 standard, some C++17 features were incorrectly disabled due to macro definition checks, leading to data alignment issues and ultimately causing the crash. This is the root cause.
However, after enabling /Zc:__cplus_plus, the problem became more complex. While this solved the issue on CUDA 12.8, it caused a build break on CUDA 13.0 and SageAttention 2.2.0. The main reason is CUTE_GRID_CONSTANT. When this macro is correctly enabled, some kernel parameters require 128-byte alignment, which conflicts with the MSVC compiler, which only supports 16-byte alignment.
To fix this, I need to modify the parameter passing method of some kernels. This is not a simple modification; it requires changing the kernel to pass parameters by pointer, instead of packing parameters. This means using cudamalloc, which might severely impact performance. However, I might have to do this.

After making these modifications locally, the performance of SageAttention3 seems to have decreased, but SageAttention 2.2.0 seems to be faster than before? I haven't figured out why yet. But their speeds are similar on my 5090; I'm not sure if this is correct.

If I can't do better, I might upload my code first.

简单说，我本地改好了，但是改的有点多，跟我想的保持最小修改有点冲突，而且不确定这么改会不会降低太多效率。我可以先提交，大家可以先看看。

compilation parameter, and corrected the MSVC parameter passing method to be compatible with CUDA 13.0.

woct0rdho · 2025-12-14T10:50:16Z

That's great news! I guess there should be a way to do 128-byte alignment in MSVC (and when MSVC is called in nvcc). What if you add something like alignas(128) or __declspec(align(128)) to the data structure that needs the alignment?

mengqin · 2025-12-14T11:05:11Z

That's great news! I guess there should be a way to do 128-byte alignment in MSVC (and when MSVC is called in nvcc). What if you add something like alignas(128) or __declspec(align(128)) to the data structure that needs the alignment?

The problem isn't data structure alignment, but rather kernel function parameter alignment. Some macros in CUTLASS use alignas(128) alignment specifiers under CUDA 13.0, which is not allowed for function parameters in MSVC. MSVC only allows 16-byte alignment for function parameters. Honestly, I don't think there's a direct solution. I can only modify the relevant kernel interfaces, changing from passing by value to passing by pointer.
I have already submitted the code; you can refer to it.

tlennon-ie · 2026-01-07T14:43:38Z

Thank you for your PR, i was able to build the following:

SageAttention: 3 (Blackwell)
Python: 3.13.x (cp313)
PyTorch: 2.9.0+cu130
CUDA Toolkit: 13.0
Platform: Windows x86_64

If anyone else wants it, here's the uploaded wheel if it suits your versions:
https://huggingface.co/tlennon-ie/sageattn3-1.0.0-py3.13-torch2.9.0cu130-cuda130-win_amd64.whl

mengqin added 3 commits December 6, 2025 20:49

Modify setup.py to make it suitable for Windows builds and

2e0b948

multi-architecture builds.

Modify sageattn3's setup.py and header files to make it compatible with

2799f20

Windows builds and add a build option for GPU-free platforms.

Update README.md

444e8b0

mengqin added 3 commits December 14, 2025 01:09

Fixed a runtime crash in Sage Attention 3, added the /Zc:__cplus_plus

8bb81e4

compilation parameter, and corrected the MSVC parameter passing method to be compatible with CUDA 13.0.

Merge remote-tracking branch 'upstream/main'

7220213

Fixed an error caused by a comment.

8f57f60

woct0rdho mentioned this pull request Dec 14, 2025

SageAttention3 woct0rdho/SageAttention#42

Closed

Rogala mentioned this pull request Dec 21, 2025

Sageattention3 blackwell Successful compilation and test passing, but does it work? woct0rdho/SageAttention#70

Closed

Merge remote-tracking branch 'thu-ml/SageAttention'

821da7b

Merge branch 'thu-ml:main' into main

142c02a

woct0rdho mentioned this pull request Feb 16, 2026

change ComfyUI kj-nodes to support sageattn3_blackwell #270

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sage Attention supports minimal modifications when compiling MS Visual Studio on Windows.#323

Sage Attention supports minimal modifications when compiling MS Visual Studio on Windows.#323
mengqin wants to merge 8 commits intothu-ml:mainfrom
mengqin:main

mengqin commented Dec 8, 2025

Uh oh!

woct0rdho commented Dec 13, 2025

Uh oh!

mengqin commented Dec 14, 2025 •

edited

Loading

Uh oh!

woct0rdho commented Dec 14, 2025

Uh oh!

mengqin commented Dec 14, 2025

Uh oh!

tlennon-ie commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mengqin commented Dec 8, 2025

Uh oh!

woct0rdho commented Dec 13, 2025

Uh oh!

mengqin commented Dec 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

woct0rdho commented Dec 14, 2025

Uh oh!

mengqin commented Dec 14, 2025

Uh oh!

tlennon-ie commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mengqin commented Dec 14, 2025 •

edited

Loading