vLLM has reported that SM121 NVFP4 inference is generally slow on SM12x, especially on DGX Spark (SM121) Current issue tracks ongoing efforts to improve performance on SM120 & 121 for NVFP4 inference. Benchmarking: * #3002 Kernel Improvements: * #3008 * #3014 * #3026 * #3051 * #3066 * #3080 * #3193 Misc. fixes * #3152 * #3191
vLLM has reported that SM121 NVFP4 inference is generally slow on SM12x, especially on DGX Spark (SM121)
Current issue tracks ongoing efforts to improve performance on SM120 & 121 for NVFP4 inference.
Benchmarking:
Kernel Improvements:
Misc. fixes