Skip to content

Support emulated mode for mxfp8 moe training to support non-sm100 CI or dev env #3598

@danielvegamyhre

Description

@danielvegamyhre

Some gaps:

  • triton kernels we import will fail at import time due to compile error due to inline PTX not supported
  • cuda kernels all fall back to "raise NotImplemented" if missing, which will be the case on non-sm100
  • all tests, benchmarks, everything is gated on sm100 and cuda 12.8+.
  • alternate code paths for every quantization kernel and blocked layout kernel would need to be wired into the autgrad func. i think the cuda blocked layout kernel could run on non sm100, but we would need to update the build processes

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions