You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
triton kernels we import will fail at import time due to compile error due to inline PTX not supported
cuda kernels all fall back to "raise NotImplemented" if missing, which will be the case on non-sm100
all tests, benchmarks, everything is gated on sm100 and cuda 12.8+.
alternate code paths for every quantization kernel and blocked layout kernel would need to be wired into the autgrad func. i think the cuda blocked layout kernel could run on non sm100, but we would need to update the build processes