bench: Enable microbenchmarking on SM121 by bkryu · Pull Request #3002 · flashinfer-ai/flashinfer

bkryu · 2026-04-07T15:57:55Z

📌 Description

Existing microbenchmark harness's hard coded support checks did not enable SM121 (DGX Spark) at all. Hence, only select APIs with API-level support check such as mm_fp4 or bmm_fp8 were enabled on Spark.

As of current, SM120 and SM121 share support surface. This PR gives parity between microbenchmarking SM120 and SM121.

No library code or unit test changes

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

New Features
- Added support for CUDA compute capability 12.1 across benchmark routines, enabling performance testing on newer hardware.

coderabbitai · 2026-04-07T15:58:15Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 77713bc9-64cd-4902-8d8c-fbeccffda5e5

📥 Commits

Reviewing files that changed from the base of the PR and between e7f630c and 0d2168a.

📒 Files selected for processing (1)

benchmarks/routines/flashinfer_benchmark_utils.py

📝 Walkthrough

Walkthrough

The PR extends CUDA compute capability 12.1 support across benchmark routines in FlashInfer by adding "12.1" entries to the routine_cc_to_supported_backends mapping, enabling backend support for attention wrappers, GEMM operations, MoE kernels, and quantization/norm/sampling routines with the new compute capability.

Changes

Cohort / File(s)	Summary
Benchmark Backend Mappings `benchmarks/routines/flashinfer_benchmark_utils.py`	Added `"12.1"` compute capability support to `routine_cc_to_supported_backends` across attention wrapper routines (`BatchDecodeWithPagedKVCacheWrapper`, `BatchPrefillWithPagedKVCacheWrapper`, etc.), GEMM/bmm/mm/MoE operations, norm, quantization, sampling, rope, and mamba routines. For attention wrappers, `"12.1"` mirrors existing `"12.0"` backend lists. Notable: `cutlass_fused_moe` now explicitly lists `"cutlass"` backend for both `"12.0"` and `"12.1"`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

flashinfer-ai/flashinfer#2055: Modifies the same backend-support mapping for attention wrapper routines in the benchmark utilities.
flashinfer-ai/flashinfer#2725: Adds SM12/12.x support for fused MoE kernels with cutlass/trtllm backends.
flashinfer-ai/flashinfer#2012: Adds SM 12.1 support for FP4 backend mappings in the same utilities file.

Suggested labels

op: moe

Suggested reviewers

yzh119
cyx-6
jimmyzho
kahyunnam
nv-yunzheq
sricketts

Poem

🐰 The compute gets stronger, now 12.1 shines so bright,
Backend mappings multiply, benchmarks burning through the night,
MoE and attention dance, from GPU cores they gleam,
FlashInfer's support expands—a quantum-leaping dream! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'bench: Enable microbenchmarking on SM121' clearly and concisely summarizes the main objective: enabling SM121 support for microbenchmarking.
Description check	✅ Passed	The PR description includes a detailed explanation of the problem (SM121 not enabled), the solution (adding parity with SM120), and confirms no library/test changes. Pre-commit checklist is completed.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request adds support for CUDA version 12.1 across various benchmark routines in flashinfer_benchmark_utils.py, including prefill wrappers, GEMM, MOE, normalization, and sampling. Additionally, it updates the supported backends for CUDA 12.0 in the cute_dsl_fp4_block_scale_moe routine to include "cutlass". I have no feedback to provide.

bkryu added 2 commits April 7, 2026 15:56

Add spark to microbenchmarks

6548ae5

CUTLASS MoE

951d414

bkryu requested review from aleozlx, cyx-6, jimmyzho, kahyunnam, nv-yunzheq, saltyminty, samuellees, sricketts, yongwww, yyihuang and yzh119 as code owners April 7, 2026 15:57

bkryu self-assigned this Apr 7, 2026

bkryu added benchmark Pertains to performance benchmarking run-ci labels Apr 7, 2026

Merge branch 'main' into bench_sm121

0d2168a

gemini-code-assist Bot reviewed Apr 7, 2026

View reviewed changes

nv-yunzheq approved these changes Apr 7, 2026

View reviewed changes

bkryu merged commit d87f4de into flashinfer-ai:main Apr 7, 2026
21 checks passed

bkryu deleted the bench_sm121 branch April 7, 2026 18:54

bkryu mentioned this pull request Apr 8, 2026

Using FlashInfer CUTLASS Backend for vLLM is Slow on SM120/121 #3013

Open

coderabbitai Bot mentioned this pull request Apr 14, 2026

feat: Add b12x CuTe DSL fused MoE for SM120 #3066

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench: Enable microbenchmarking on SM121#3002

bench: Enable microbenchmarking on SM121#3002
bkryu merged 3 commits intoflashinfer-ai:mainfrom
bkryu:bench_sm121

bkryu commented Apr 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 7, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bkryu commented Apr 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bkryu commented Apr 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 7, 2026 •

edited

Loading