Skip to content

[FEA] Block-Scaled Data Format Support for SM120 in CuTe DSL #2867

@DD-DuDa

Description

@DD-DuDa

Which component requires the feature?

CuTe DSL

Feature Request

Is your feature request related to a problem? Please describe.
I wish I could use CuTe DSL (Python) to implement block-scaled GEMM operations on SM120 (Blackwell) architecture. Currently, all CuTe DSL Python examples for block-scaled formats (MXFP4, MXFP6, MXFP8) only support SM100.

Describe the solution you'd like
Add CuTe DSL Python support for SM120 block-scaled operations:

  1. Python helper functions for SM120 (similar to existing SM100 helpers in blackwell_helpers.py)
  2. SM120 MMA operation wrappers for block-scaled formats (MmaMXF4Op, MmaMXF8Op, etc.)
  3. Example scripts for dense and grouped block-scaled GEMM on SM120
  4. Support for MXFP4 (Float4E2M1FN), MXFP6, and MXFP8 formats with corresponding scale factor types

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions