Hello,
Running sage attention operation gives the following exception
terminate called after throwing an instance of 'c10::AcceleratorError'
what(): CUDA error: unspecified launch failure
Search for `cudaErrorLaunchFailure' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x80 (0x7fb99a97cb80 in /usr/local/lib/python3.10/dist-packages/torch/lib/libc10.so)
Any idea why?
Setup:
Cuda toolkit by nvcc -V and nvidia-smi: 12.6
Pytorch version: 2.9.1+cu126
sageattention version: 2.2.0