You can see the exact modifications made for AMD and triton-windows compatibility here.
- Block sizes modified for better ROCm compatibility:
BLKQ: 128,BLKK: 64BLOCK_M: 128,BLOCK_N: 64
- Numerical Stability: Kept accumulation in
float32. - Optimization: Triton autotuning added.
Tested on RDNA 4 with TheRock ROCm 7, PyTorch and triton-windows release v3.6.0-windows.post25.
cd ComfyUI/custom_nodes/ComfyUI-WanVideoWrapper/ultravico/ Remove-Item -Recurse -Force sageattn git clone https://github.com/0xDELUXA/ComfyUI-WanVideoWrapper-ultravico_AMD sageattn