[Perf][DSv4] Add cuteDSL generic LL Blockwise FP8 GEMM by LopezCastroRoberto · Pull Request #43214 · vllm-project/vllm

LopezCastroRoberto · 2026-05-20T13:44:32Z

No description provided.

Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>

gemini-code-assist

Code Review

This pull request introduces the LLFp8BlockScaledMMKernel, a low-latency FP8 block-scaled matrix multiplication kernel implemented using CuTe DSL. The kernel is designed for small batch sizes (M <= 16) and utilizes warp-specialized instructions with asynchronous copies. Feedback indicates a critical issue where the kernel hardcodes the E4M3 FP8 format but lacks a check in can_implement to prevent its use when E5M2 is enabled, which would lead to incorrect computations.

gemini-code-assist · 2026-05-20T13:47:36Z

+    @classmethod
+    def can_implement(cls, config):
+        return super().can_implement(config)


The underlying cuteDSL kernel (_ll_fp8_block_kernels.py) uses a hardcoded mma.sync instruction for the E4M3 FP8 format. However, this kernel can be selected even when the E5M2 format is enabled (via VLLM_USE_DEEP_GEMM_E5M2), which would lead to incorrect computation.

To prevent this, can_implement should check if E5M2 is being used and reject the configuration if so.

@classmethod def can_implement(cls, config): from vllm.utils.deep_gemm import is_deep_gemm_e8m0_used can_implement_base, reason = super().can_implement(config) if not can_implement_base: return can_implement_base, reason if is_deep_gemm_e8m0_used(): return False, "LLFp8BlockScaledMMKernel only supports E4M3, not E5M2." return True, None

init

7083db8

Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>

LopezCastroRoberto requested review from mgoin, pavanimajety and zyongye as code owners May 20, 2026 13:44

LopezCastroRoberto marked this pull request as draft May 20, 2026 13:44

gemini-code-assist Bot reviewed May 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Perf][DSv4] Add cuteDSL generic LL Blockwise FP8 GEMM#43214

[Perf][DSv4] Add cuteDSL generic LL Blockwise FP8 GEMM#43214
LopezCastroRoberto wants to merge 1 commit into
vllm-project:mainfrom
LopezCastroRoberto:perf/ll_bw_fp8

LopezCastroRoberto commented May 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

LopezCastroRoberto commented May 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant