Commit b3e3693
authored
[SM121] Enable native block-scaled dot_scaled for DGX Spark (GB10) (#10010)
SM121 (GB10 DGX Spark) supports the same mma.sync block-scaled
instructions as SM120 (RTX 5090) but was excluded from the native
lowering path by exact compute capability checks.
Without this fix, dot_scaled on SM121 falls through to
DecomposeScaledBlocked which upcasts to bf16 — ~10 TFLOPS vs ~270 TFLOPS
with native mma.sync block-scaled FP4.
Tested on GB10 with both MXFP4 (scale_vec::2X, ue8m0) and NVFP4
(scale_vec::4X, ue4m3).
# New contributor declaration
- [ x] I am not making a trivial change, such as fixing a typo in a
comment.
- [ x] I have written a PR description following these
[rules](https://cbea.ms/git-commit/#why-not-how).
- [ x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`.
- Select one of the following.
- [ ] I have added tests.
- `/test` for `lit` tests
- `/unittest` for C++ tests
- `/python/test` for end-to-end tests
- [ x ] This PR does not need a test because current test paths cover
the flow, though there are no GB10s in CI to verify AFAIK it does work
for me.
- Select one of the following.
- [x ] I have not added any `lit` tests.1 parent f7c1d69 commit b3e3693
1 file changed
Lines changed: 2 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
677 | 677 | | |
678 | 678 | | |
679 | 679 | | |
680 | | - | |
| 680 | + | |
681 | 681 | | |
682 | 682 | | |
683 | 683 | | |
| |||
924 | 924 | | |
925 | 925 | | |
926 | 926 | | |
927 | | - | |
| 927 | + | |
928 | 928 | | |
929 | 929 | | |
930 | 930 | | |
| |||
0 commit comments