Is there any plan to support mxfp4 for MI300X? #13611
-
|
hello, I saw AMD Instinct Development Roadmap (2025Q4) (#12890) and it mentioned additional quantization support
Does it include mxfp4 support for MI300X? p.s. benchmark result (from https://github.com/sgl-project/sglang/blob/main/test/srt/test_gpt_oss_1gpu.py) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
|
Yes ! MI300X (gfx94x) does support MXFP4 at the hardware level. The current check in mxfp_supported() is just too strict. Adding "gfx94" is correct since MI300X devices report gfx94*, and the MXFP4/FP8 path is already functional there. Your benchmark results make sense, and it's expected that the patch works. We should update the detection logic in sglang to include MI300X officially. |
Beta Was this translation helpful? Give feedback.
-
|
ROCm Support for MXFP4 According to AMD’s ROCm 7 solutions brief, ROCm 7 does introduce support for “low-precision … MXFP4” on MI300X. So, at the hardware/software layer, there's some foundation for MXFP4 on MI300X. FP4 Inference on MI300X via “Petit” A project called Petit (by LMSYS) provides optimized mixed-precision kernels to run FP4 models on AMD MI300 series. However, “Petit” doesn’t use native MXFP4 matmul on MI300X — instead, it dequantizes FP4 weights into BF16/FP16 for computation. That means you're not truly running in MXFP4 arithmetic, but converting on the fly. vLLM Status On the vLLM side: the vLLM ROCm inference docs explicitly say that MXFP4 is supported only on MI355X and MI350X. There is also a GitHub issue in llm-compressor about adding MXFP4 support (both dense & MoE). FP8 is already a feature request / topic for MI300X support in vLLM/ROCm. Model / Framework Support For Llama 4, AMD + vLLM have optimized kernels for MI300X (but using BF16 in the published blog). On the quantization/model front, gpt-oss (which uses MXFP4 for its MoE weights) is supported by vLLM on MI300X, but in their blog post they mention Blackwell and Hopper GPUs, not AMD. For llama.cpp, there is support for native MXFP4 (ggml backends) but that’s more on the CPU/CUDA/Vulkan sides, not necessarily ROCm. |
Beta Was this translation helpful? Give feedback.
Yes ! MI300X (gfx94x) does support MXFP4 at the hardware level. The current check in mxfp_supported() is just too strict. Adding "gfx94" is correct since MI300X devices report gfx94*, and the MXFP4/FP8 path is already functional there.
Your benchmark results make sense, and it's expected that the patch works. We should update the detection logic in sglang to include MI300X officially.