Commit e19678b
committed
Enable tcgen05 blockscaled ops on Thor SM110
Edge-LLM NvFP4 MoE CuTeDSL kernels on Thor use tcgen05 blockscaled MMA and SMEM-to-TMEM scale-factor copies. The existing checks only admitted the SM100/SM103 paths, so source-built CuTeDSL rejected SM110.
Admit Thor's blockscaled MMA arch aliases sm_101a and sm_110a, and allow the SM110f family for S2T tcgen05 copy ops.
Validation:
- git diff --check
- python3 -m py_compile python/CuTeDSL/cutlass/cute/nvgpu/tcgen05/mma.py python/CuTeDSL/cutlass/cute/nvgpu/tcgen05/copy.py
- DKG grouped_blockscaled_gemm.py documented 4-group example on Thor SM110: PASS
- Edge-LLM nvfp4_moe AOT for sm_110/aarch64: 12/12 variants PASS1 parent 9c1d096 commit e19678b
2 files changed
Lines changed: 9 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
786 | 786 | | |
787 | 787 | | |
788 | 788 | | |
789 | | - | |
| 789 | + | |
| 790 | + | |
790 | 791 | | |
791 | 792 | | |
792 | 793 | | |
793 | 794 | | |
794 | 795 | | |
795 | 796 | | |
796 | 797 | | |
797 | | - | |
798 | | - | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
799 | 803 | | |
800 | 804 | | |
801 | 805 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
386 | 386 | | |
387 | 387 | | |
388 | 388 | | |
| 389 | + | |
389 | 390 | | |
| 391 | + | |
390 | 392 | | |
391 | 393 | | |
392 | 394 | | |
| |||
0 commit comments