Skip to content

bench: Enable microbenchmarking on SM121#3002

Merged
bkryu merged 3 commits intoflashinfer-ai:mainfrom
bkryu:bench_sm121
Apr 7, 2026
Merged

bench: Enable microbenchmarking on SM121#3002
bkryu merged 3 commits intoflashinfer-ai:mainfrom
bkryu:bench_sm121

Conversation

@bkryu
Copy link
Copy Markdown
Collaborator

@bkryu bkryu commented Apr 7, 2026

📌 Description

Existing microbenchmark harness's hard coded support checks did not enable SM121 (DGX Spark) at all. Hence, only select APIs with API-level support check such as mm_fp4 or bmm_fp8 were enabled on Spark.

As of current, SM120 and SM121 share support surface. This PR gives parity between microbenchmarking SM120 and SM121.

No library code or unit test changes

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

  • New Features
    • Added support for CUDA compute capability 12.1 across benchmark routines, enabling performance testing on newer hardware.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 7, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 77713bc9-64cd-4902-8d8c-fbeccffda5e5

📥 Commits

Reviewing files that changed from the base of the PR and between e7f630c and 0d2168a.

📒 Files selected for processing (1)
  • benchmarks/routines/flashinfer_benchmark_utils.py

📝 Walkthrough

Walkthrough

The PR extends CUDA compute capability 12.1 support across benchmark routines in FlashInfer by adding "12.1" entries to the routine_cc_to_supported_backends mapping, enabling backend support for attention wrappers, GEMM operations, MoE kernels, and quantization/norm/sampling routines with the new compute capability.

Changes

Cohort / File(s) Summary
Benchmark Backend Mappings
benchmarks/routines/flashinfer_benchmark_utils.py
Added "12.1" compute capability support to routine_cc_to_supported_backends across attention wrapper routines (BatchDecodeWithPagedKVCacheWrapper, BatchPrefillWithPagedKVCacheWrapper, etc.), GEMM/bmm/mm/MoE operations, norm, quantization, sampling, rope, and mamba routines. For attention wrappers, "12.1" mirrors existing "12.0" backend lists. Notable: cutlass_fused_moe now explicitly lists "cutlass" backend for both "12.0" and "12.1".

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested labels

op: moe

Suggested reviewers

  • yzh119
  • cyx-6
  • jimmyzho
  • kahyunnam
  • nv-yunzheq
  • sricketts

Poem

🐰 The compute gets stronger, now 12.1 shines so bright,
Backend mappings multiply, benchmarks burning through the night,
MoE and attention dance, from GPU cores they gleam,
FlashInfer's support expands—a quantum-leaping dream! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'bench: Enable microbenchmarking on SM121' clearly and concisely summarizes the main objective: enabling SM121 support for microbenchmarking.
Description check ✅ Passed The PR description includes a detailed explanation of the problem (SM121 not enabled), the solution (adding parity with SM120), and confirms no library/test changes. Pre-commit checklist is completed.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bkryu bkryu self-assigned this Apr 7, 2026
@bkryu bkryu added benchmark Pertains to performance benchmarking run-ci labels Apr 7, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for CUDA version 12.1 across various benchmark routines in flashinfer_benchmark_utils.py, including prefill wrappers, GEMM, MOE, normalization, and sampling. Additionally, it updates the supported backends for CUDA 12.0 in the cute_dsl_fp4_block_scale_moe routine to include "cutlass". I have no feedback to provide.

@bkryu bkryu merged commit d87f4de into flashinfer-ai:main Apr 7, 2026
21 checks passed
@bkryu bkryu deleted the bench_sm121 branch April 7, 2026 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark Pertains to performance benchmarking run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants