Support emulated mode for mxfp8 moe training to support non-sm100 CI or dev env

Some gaps: 
- triton kernels we import will fail at import time due to compile error due to inline PTX not supported
- cuda kernels all fall back to "raise NotImplemented" if missing, which will be the case on non-sm100
- all tests, benchmarks, everything is gated on sm100 and cuda 12.8+. 
- alternate code paths for every quantization kernel and blocked layout kernel would need to be wired into the autgrad func. i think the cuda blocked layout kernel could run on non sm100, but we would need to update the build processes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support emulated mode for mxfp8 moe training to support non-sm100 CI or dev env #3598

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support emulated mode for mxfp8 moe training to support non-sm100 CI or dev env #3598

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions