docs: clarify hardware matrix details

ianliuy · Copilot · ianliuy · commit e7a82685ea2e · 2026-04-21T18:43:46.000-07:00
Co-authored-by: Copilot &lt;223556219+Copilot@users.noreply.github.com&gt;
diff --git a/docs/supported_hardware.rst b/docs/supported_hardware.rst
@@ -4,8 +4,9 @@ Supported Hardware
 ==================
 
 FlashInfer leverages NVIDIA Tensor Core instructions that require specific GPU
-architectures. This page summarizes the minimum compute capability required for
-each quantization data type supported by FlashInfer.
+architectures. This page provides a high-level, manually maintained summary of
+the minimum compute capability required for each quantization data type
+supported by FlashInfer.
 
 .. tip::
 
@@ -15,8 +16,9 @@ Quantization Data Types
 -----------------------
 
 The table below shows the minimum compute capability at which *any* backend
-supports the given data type. Some backends may require a higher capability;
-consult the per-function API documentation for exact requirements.
+supports the given data type. Backend coverage can still vary by operation
+(``mm`` vs ``bmm``), scale layout, and CUDA/cuDNN version; consult the
+per-function API documentation for exact requirements.
 
 .. list-table::
    :header-rows: 1
@@ -31,9 +33,11 @@ consult the per-function API documentation for exact requirements.
      - Ada / Hopper+
      - cuDNN and cuBLAS backends. CUTLASS requires sm_100+.
    * - MXFP8
-     - sm_100 (10.0)
-     - Blackwell+
-     -
+      - sm_100 (10.0)
+      - Blackwell+
+      - Support is backend-specific. MXFP8 support starts at sm_100 for GEMM on
+        Blackwell, but BMM uses cuDNN on sm_100 / sm_103 and CUTLASS on sm_120 /
+        sm_121. See :doc:`/api/gemm` for per-operation backend details.
    * - NVFP4 KV dequantize
      - sm_80 (8.0)
      - Ampere+
@@ -46,8 +50,12 @@ consult the per-function API documentation for exact requirements.
 .. note::
 
    This table reflects FlashInfer's current support. The per-function API
-   documentation (e.g., :doc:`/api/fp4_quantization`) is the authoritative
-   source for each operation's hardware requirements.
+   documentation (e.g., :doc:`/api/fp4_quantization` and :doc:`/api/gemm`) is
+   the authoritative source for each operation's hardware requirements.
+
+   Compute capability is not the only requirement. FlashInfer currently
+   supports CUDA 12.6, 12.8, 13.0, and 13.1 (see :doc:`/installation`), and
+   Blackwell FP4 / MXFP4 features require CUDA 12.8+.
 
    For the official NVIDIA hardware specifications, see the
    `PTX ISA Reference <https://docs.nvidia.com/cuda/parallel-thread-execution/>`_