-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Labels
Description
Which component requires the feature?
CuTe DSL
Feature Request
Is your feature request related to a problem? Please describe.
I wish I could use CuTe DSL (Python) to implement block-scaled GEMM operations on SM120 (Blackwell) architecture. Currently, all CuTe DSL Python examples for block-scaled formats (MXFP4, MXFP6, MXFP8) only support SM100.
Describe the solution you'd like
Add CuTe DSL Python support for SM120 block-scaled operations:
- Python helper functions for SM120 (similar to existing SM100 helpers in blackwell_helpers.py)
- SM120 MMA operation wrappers for block-scaled formats (MmaMXF4Op, MmaMXF8Op, etc.)
- Example scripts for dense and grouped block-scaled GEMM on SM120
- Support for MXFP4 (Float4E2M1FN), MXFP6, and MXFP8 formats with corresponding scale factor types