bench: Enable microbenchmarking on SM121#3002
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe PR extends CUDA compute capability 12.1 support across benchmark routines in FlashInfer by adding Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request adds support for CUDA version 12.1 across various benchmark routines in flashinfer_benchmark_utils.py, including prefill wrappers, GEMM, MOE, normalization, and sampling. Additionally, it updates the supported backends for CUDA 12.0 in the cute_dsl_fp4_block_scale_moe routine to include "cutlass". I have no feedback to provide.
📌 Description
Existing microbenchmark harness's hard coded support checks did not enable SM121 (DGX Spark) at all. Hence, only select APIs with API-level support check such as mm_fp4 or bmm_fp8 were enabled on Spark.
As of current, SM120 and SM121 share support surface. This PR gives parity between microbenchmarking SM120 and SM121.
No library code or unit test changes
🔍 Related Issues
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
Summary by CodeRabbit