@@ -4,8 +4,9 @@ Supported Hardware
44==================
55
66FlashInfer leverages NVIDIA Tensor Core instructions that require specific GPU
7- architectures. This page summarizes the minimum compute capability required for
8- each quantization data type supported by FlashInfer.
7+ architectures. This page provides a high-level, manually maintained summary of
8+ the minimum compute capability required for each quantization data type
9+ supported by FlashInfer.
910
1011.. tip ::
1112
@@ -15,8 +16,9 @@ Quantization Data Types
1516-----------------------
1617
1718The table below shows the minimum compute capability at which *any * backend
18- supports the given data type. Some backends may require a higher capability;
19- consult the per-function API documentation for exact requirements.
19+ supports the given data type. Backend coverage can still vary by operation
20+ (``mm `` vs ``bmm ``), scale layout, and CUDA/cuDNN version; consult the
21+ per-function API documentation for exact requirements.
2022
2123.. list-table ::
2224 :header-rows: 1
@@ -31,9 +33,11 @@ consult the per-function API documentation for exact requirements.
3133 - Ada / Hopper+
3234 - cuDNN and cuBLAS backends. CUTLASS requires sm_100+.
3335 * - MXFP8
34- - sm_100 (10.0)
35- - Blackwell+
36- -
36+ - sm_100 (10.0)
37+ - Blackwell+
38+ - Support is backend-specific. MXFP8 support starts at sm_100 for GEMM on
39+ Blackwell, but BMM uses cuDNN on sm_100 / sm_103 and CUTLASS on sm_120 /
40+ sm_121. See :doc: `/api/gemm ` for per-operation backend details.
3741 * - NVFP4 KV dequantize
3842 - sm_80 (8.0)
3943 - Ampere+
@@ -46,8 +50,12 @@ consult the per-function API documentation for exact requirements.
4650.. note ::
4751
4852 This table reflects FlashInfer's current support. The per-function API
49- documentation (e.g., :doc: `/api/fp4_quantization `) is the authoritative
50- source for each operation's hardware requirements.
53+ documentation (e.g., :doc: `/api/fp4_quantization ` and :doc: `/api/gemm `) is
54+ the authoritative source for each operation's hardware requirements.
55+
56+ Compute capability is not the only requirement. FlashInfer currently
57+ supports CUDA 12.6, 12.8, 13.0, and 13.1 (see :doc: `/installation `), and
58+ Blackwell FP4 / MXFP4 features require CUDA 12.8+.
5159
5260 For the official NVIDIA hardware specifications, see the
5361 `PTX ISA Reference <https://docs.nvidia.com/cuda/parallel-thread-execution/ >`_
0 commit comments