Skip to content

Commit e7a8268

Browse files
ianliuyCopilot
andcommitted
docs: clarify hardware matrix details
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent c90d57c commit e7a8268

File tree

1 file changed

+17
-9
lines changed

1 file changed

+17
-9
lines changed

docs/supported_hardware.rst

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,9 @@ Supported Hardware
44
==================
55

66
FlashInfer leverages NVIDIA Tensor Core instructions that require specific GPU
7-
architectures. This page summarizes the minimum compute capability required for
8-
each quantization data type supported by FlashInfer.
7+
architectures. This page provides a high-level, manually maintained summary of
8+
the minimum compute capability required for each quantization data type
9+
supported by FlashInfer.
910

1011
.. tip::
1112

@@ -15,8 +16,9 @@ Quantization Data Types
1516
-----------------------
1617

1718
The table below shows the minimum compute capability at which *any* backend
18-
supports the given data type. Some backends may require a higher capability;
19-
consult the per-function API documentation for exact requirements.
19+
supports the given data type. Backend coverage can still vary by operation
20+
(``mm`` vs ``bmm``), scale layout, and CUDA/cuDNN version; consult the
21+
per-function API documentation for exact requirements.
2022

2123
.. list-table::
2224
:header-rows: 1
@@ -31,9 +33,11 @@ consult the per-function API documentation for exact requirements.
3133
- Ada / Hopper+
3234
- cuDNN and cuBLAS backends. CUTLASS requires sm_100+.
3335
* - MXFP8
34-
- sm_100 (10.0)
35-
- Blackwell+
36-
-
36+
- sm_100 (10.0)
37+
- Blackwell+
38+
- Support is backend-specific. MXFP8 support starts at sm_100 for GEMM on
39+
Blackwell, but BMM uses cuDNN on sm_100 / sm_103 and CUTLASS on sm_120 /
40+
sm_121. See :doc:`/api/gemm` for per-operation backend details.
3741
* - NVFP4 KV dequantize
3842
- sm_80 (8.0)
3943
- Ampere+
@@ -46,8 +50,12 @@ consult the per-function API documentation for exact requirements.
4650
.. note::
4751

4852
This table reflects FlashInfer's current support. The per-function API
49-
documentation (e.g., :doc:`/api/fp4_quantization`) is the authoritative
50-
source for each operation's hardware requirements.
53+
documentation (e.g., :doc:`/api/fp4_quantization` and :doc:`/api/gemm`) is
54+
the authoritative source for each operation's hardware requirements.
55+
56+
Compute capability is not the only requirement. FlashInfer currently
57+
supports CUDA 12.6, 12.8, 13.0, and 13.1 (see :doc:`/installation`), and
58+
Blackwell FP4 / MXFP4 features require CUDA 12.8+.
5159

5260
For the official NVIDIA hardware specifications, see the
5361
`PTX ISA Reference <https://docs.nvidia.com/cuda/parallel-thread-execution/>`_

0 commit comments

Comments
 (0)