Commit bb873d2
feat: Integrate CuTe DSL FMHA prefill kernels by loading cubin (#3039)
<!-- .github/pull_request_template.md -->
## 📌 Description
<!-- What does this PR do? Briefly describe the changes and why they’re
needed. -->
feat: Integrate CuTe DSL FMHA cubin kernels into FlashInfer prefill
backend
**Summary**
- Integrate pre-compiled CuTe DSL FMHA kernels (Blackwell
SM100/SM103/SM110) into FlashInfer's prefill attention backend
- Load AOT-compiled .so cubins from NVIDIA artifactory at runtime, no
JIT compilation needed
- Route through trtllm_ragged_attention_deepseek() API with
backend="cute-dsl"
**Key features**
- Dtype support: FP16, BF16, FP8 (E4M3) input with mixed-precision
output (E4M3→BF16)
- Head dimensions: 32, 64, 128, 192 (192 for FP8 only)
- Varlen ragged prefill: variable-length sequences via cumulative seqlen
tensors
- TVM-FFI ABI: all variants use TVM-FFI for kernel invocation
- Skip-softmax sparsity: optional skip-softmax optimization for sparse
attention
- LSE output: optional log-sum-exp output for numerically stable
multi-pass attention
- Causal & non-causal masking: both modes supported (all varlen variants
use non-persistent scheduling)
- Multi-arch cubin loading: per-CPU-arch (x86_64/aarch64) and
per-SM-arch artifact paths
- Checksum verification: SHA256 integrity check on downloaded .so files
**Files changed**
- flashinfer/attention_dsl/cute_dsl/fmha.py — kernel loading, variant
selection, ragged prefill entry point
- flashinfer/artifacts.py — artifact paths and checksums for DSL FMHA
(x86_64 + aarch64 layout)
- flashinfer/prefill.py — trtllm_ragged_attention_deepseek() cute-dsl
backend integration
**Test plan**
- test_trtllm_gen_attention.py::test_trtllm_gen_prefill -k "cute-dsl"
passes
- Benchmark via bench_cute_dsl_ragged.sh on target hardware
- Verify cubin download + checksum verification on clean install
**Performance**
**Setup:** B200 (sm_100a), causal, H_q=H_k=128, tested using FI
benchmark (CUDA Graph, cupti)
FP8 e4m3 (D=192):
| Shape (B×S_q×S_kv) | cute-dsl (ms) | trtllm-native (ms) | TFLOPS
(dsl/native) | Speedup |
|---------------------|--------------|--------------------|--------------------|---------|
| 1×8K×8K | 1.521 | 1.619 | 1808 / 1698 | **+6.4%** |
| 1×8K×32K | 8.466 | 9.451 | 2273 / 2036 | **+11.6%** |
| 1×8K×64K | 17.796 | 19.869 | 2317 / 2075 | **+11.7%** |
| 4×512×82K | 6.397 | 7.286 | 2142 / 1880 | **+13.9%** |
| 4×1K×82K | 12.285 | 13.834 | 2224 / 1975 | **+12.6%** |
FP8 e4m3 (D=128):
| Shape (B×S_q×S_kv) | cute-dsl (ms) | trtllm-native (ms) | TFLOPS
(dsl/native) | Speedup |
|---------------------|--------------|--------------------|--------------------|---------|
| 1×8K×8K | 1.484 | 1.560 | 1481 / 1410 | **+5.1%** |
| 1×8K×32K | 7.666 | 8.998 | 2008 / 1711 | **+17.4%** |
| 1×8K×64K | 16.074 | 18.606 | 2052 / 1773 | **+15.8%** |
| 4×512×82K | 5.735 | 6.460 | 1911 / 1697 | **+12.6%** |
| 4×1K×82K | 11.066 | 12.451 | 1975 / 1755 | **+12.5%** |
BF16 (D=128):
| Shape (B×S_q×S_kv) | cute-dsl (ms) | trtllm-native (ms) | TFLOPS
(dsl/native) | Speedup |
|---------------------|--------------|--------------------|--------------------|---------|
| 1×8K×8K | 1.737 | 1.764 | 1266 / 1247 | **+1.6%** |
| 1×8K×32K | 10.094 | 10.992 | 1525 / 1400 | **+8.9%** |
| 1×8K×64K | 21.745 | 23.000 | 1517 / 1434 | **+5.8%** |
| 4×512×82K | 8.457 | 8.513 | 1296 / 1288 | **+0.7%** |
| 4×1K×82K | 15.773 | 16.052 | 1385 / 1361 | **+1.8%** |
**TODO**
(1) support scalar as tensor dtype.
(2) support pdl
(3) remove front-padding for q/k/v/o tensors
## 🔍 Related Issues
<!-- Link any related issues here -->
## 🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.
### ✅ Pre-commit Checks
- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.
> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).
## 🧪 Tests
- [x] Tests have been added or updated as needed.
- [x] All tests are passing (`unittest`, etc.).
## Reviewer Notes
<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Zihao Ye <expye@outlook.com>1 parent a265b4e commit bb873d2
10 files changed
Lines changed: 1096 additions & 66 deletions
File tree
- benchmarks/routines
- flashinfer
- attention
- cute_dsl
- tests
- attention
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1646 | 1646 | | |
1647 | 1647 | | |
1648 | 1648 | | |
1649 | | - | |
1650 | | - | |
| 1649 | + | |
| 1650 | + | |
| 1651 | + | |
| 1652 | + | |
| 1653 | + | |
| 1654 | + | |
| 1655 | + | |
| 1656 | + | |
| 1657 | + | |
| 1658 | + | |
| 1659 | + | |
| 1660 | + | |
| 1661 | + | |
1651 | 1662 | | |
| 1663 | + | |
1652 | 1664 | | |
1653 | 1665 | | |
1654 | 1666 | | |
1655 | | - | |
1656 | | - | |
| 1667 | + | |
| 1668 | + | |
| 1669 | + | |
| 1670 | + | |
| 1671 | + | |
| 1672 | + | |
1657 | 1673 | | |
1658 | | - | |
1659 | | - | |
| 1674 | + | |
| 1675 | + | |
| 1676 | + | |
| 1677 | + | |
| 1678 | + | |
| 1679 | + | |
| 1680 | + | |
1660 | 1681 | | |
| 1682 | + | |
1661 | 1683 | | |
1662 | 1684 | | |
1663 | 1685 | | |
| |||
1815 | 1837 | | |
1816 | 1838 | | |
1817 | 1839 | | |
1818 | | - | |
1819 | | - | |
1820 | | - | |
| 1840 | + | |
| 1841 | + | |
| 1842 | + | |
| 1843 | + | |
| 1844 | + | |
| 1845 | + | |
1821 | 1846 | | |
1822 | 1847 | | |
1823 | 1848 | | |
1824 | 1849 | | |
1825 | 1850 | | |
| 1851 | + | |
1826 | 1852 | | |
1827 | 1853 | | |
1828 | 1854 | | |
| |||
1843 | 1869 | | |
1844 | 1870 | | |
1845 | 1871 | | |
| 1872 | + | |
| 1873 | + | |
| 1874 | + | |
| 1875 | + | |
| 1876 | + | |
| 1877 | + | |
| 1878 | + | |
| 1879 | + | |
| 1880 | + | |
| 1881 | + | |
| 1882 | + | |
| 1883 | + | |
| 1884 | + | |
| 1885 | + | |
| 1886 | + | |
| 1887 | + | |
| 1888 | + | |
| 1889 | + | |
| 1890 | + | |
| 1891 | + | |
| 1892 | + | |
| 1893 | + | |
| 1894 | + | |
| 1895 | + | |
| 1896 | + | |
1846 | 1897 | | |
1847 | 1898 | | |
1848 | 1899 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
335 | 335 | | |
336 | 336 | | |
337 | 337 | | |
338 | | - | |
339 | | - | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
340 | 354 | | |
341 | 355 | | |
342 | 356 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
145 | 145 | | |
146 | 146 | | |
147 | 147 | | |
| 148 | + | |
| 149 | + | |
148 | 150 | | |
149 | 151 | | |
150 | 152 | | |
| |||
164 | 166 | | |
165 | 167 | | |
166 | 168 | | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
167 | 183 | | |
168 | 184 | | |
169 | 185 | | |
170 | 186 | | |
171 | 187 | | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
172 | 195 | | |
173 | 196 | | |
174 | 197 | | |
| |||
191 | 214 | | |
192 | 215 | | |
193 | 216 | | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
194 | 227 | | |
195 | 228 | | |
| 229 | + | |
196 | 230 | | |
197 | 231 | | |
198 | 232 | | |
199 | 233 | | |
200 | 234 | | |
201 | 235 | | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
202 | 241 | | |
203 | 242 | | |
204 | 243 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
Lines changed: 6 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
24 | | - | |
25 | | - | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | | - | |
34 | | - | |
35 | | - | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
0 commit comments