Commit 47d41ab
[GPUHeuristics] Improve large GEMM intrinsic selection on CDNA4 (#24115)
Extend the compute-throughput-first intrinsic preference to LargeGemm
shapes, preferring MFMA_F32_32x32x16_F16 over MFMA_F32_16x16x32_F16 (4x
more output per instruction). Add VGPR pressure cap to prevent spilling
when MNT boost sets high tile counts with 32x32 intrinsics.
Top GEMM improvements on MI355X:
```
4096x1024x150000: 2112us -> 1538us (1.37x)
2268x4096x150000: 11359us -> 8529us (1.33x)
1024x4096x150000: 1982us -> 1573us (1.26x)
4096x2048x150000: 4015us -> 3307us (1.21x)
2048x8192x4096: 183us -> 154us (1.19x)
```
Top conv improvements on MI355X (NHWC, fp16):
```
n32 c256 H100xW100 k2376 3x3 wgrad: 7983us -> 6634us (1.20x)
n32 c256 H25xW25 k2376 3x3 wgrad: 777us -> 664us (1.17x)
n32 c256 H100xW100 k2376 3x3 fwd: 7042us -> 6122us (1.15x)
n32 c256 H25xW25 k2376 3x3 fwd: 452us -> 405us (1.12x)
n32 c256 H50xW50 k2376 3x3 fwd: 1711us -> 1541us (1.11x)
```
Overall GEMM benchmark: **+6.3%** geomean speedup.
Overall Proxy conv benchmark: **+2.5%** geomean speedup.
Some regressions exist in K-dominated wgrad shapes due to larger
workgroup tiles, but overall improvements outweigh regressions across
all benchmarks.
---------
Signed-off-by: yzhang93 <zhyuhang88@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>1 parent 89b536e commit 47d41ab
4 files changed
Lines changed: 73 additions & 26 deletions
File tree
- compiler/src/iree/compiler/Codegen
- Common/GPU
- Dialect/GPU/TargetUtils
- LLVMGPU/test/ROCDL
Lines changed: 48 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
697 | 697 | | |
698 | 698 | | |
699 | 699 | | |
700 | | - | |
| 700 | + | |
| 701 | + | |
701 | 702 | | |
702 | 703 | | |
703 | 704 | | |
| |||
775 | 776 | | |
776 | 777 | | |
777 | 778 | | |
778 | | - | |
| 779 | + | |
779 | 780 | | |
780 | 781 | | |
781 | 782 | | |
| |||
806 | 807 | | |
807 | 808 | | |
808 | 809 | | |
809 | | - | |
| 810 | + | |
| 811 | + | |
810 | 812 | | |
811 | 813 | | |
812 | 814 | | |
813 | | - | |
| 815 | + | |
814 | 816 | | |
815 | 817 | | |
816 | 818 | | |
| |||
834 | 836 | | |
835 | 837 | | |
836 | 838 | | |
837 | | - | |
| 839 | + | |
838 | 840 | | |
839 | 841 | | |
840 | 842 | | |
841 | 843 | | |
842 | 844 | | |
843 | 845 | | |
844 | 846 | | |
| 847 | + | |
| 848 | + | |
845 | 849 | | |
846 | 850 | | |
847 | 851 | | |
| |||
898 | 902 | | |
899 | 903 | | |
900 | 904 | | |
| 905 | + | |
| 906 | + | |
| 907 | + | |
| 908 | + | |
| 909 | + | |
| 910 | + | |
901 | 911 | | |
902 | 912 | | |
903 | 913 | | |
| |||
928 | 938 | | |
929 | 939 | | |
930 | 940 | | |
| 941 | + | |
| 942 | + | |
| 943 | + | |
| 944 | + | |
| 945 | + | |
| 946 | + | |
| 947 | + | |
| 948 | + | |
| 949 | + | |
| 950 | + | |
| 951 | + | |
| 952 | + | |
| 953 | + | |
| 954 | + | |
| 955 | + | |
| 956 | + | |
| 957 | + | |
| 958 | + | |
| 959 | + | |
| 960 | + | |
| 961 | + | |
931 | 962 | | |
932 | 963 | | |
933 | 964 | | |
| |||
938 | 969 | | |
939 | 970 | | |
940 | 971 | | |
| 972 | + | |
| 973 | + | |
| 974 | + | |
| 975 | + | |
| 976 | + | |
| 977 | + | |
| 978 | + | |
| 979 | + | |
| 980 | + | |
| 981 | + | |
| 982 | + | |
941 | 983 | | |
942 | | - | |
| 984 | + | |
943 | 985 | | |
944 | 986 | | |
945 | 987 | | |
| |||
Lines changed: 4 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
102 | 102 | | |
103 | 103 | | |
104 | 104 | | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
105 | 109 | | |
106 | 110 | | |
107 | 111 | | |
| |||
Lines changed: 3 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1351 | 1351 | | |
1352 | 1352 | | |
1353 | 1353 | | |
1354 | | - | |
| 1354 | + | |
1355 | 1355 | | |
1356 | | - | |
| 1356 | + | |
| 1357 | + | |
1357 | 1358 | | |
1358 | 1359 | | |
1359 | 1360 | | |
| |||
Lines changed: 18 additions & 18 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
412 | 412 | | |
413 | 413 | | |
414 | 414 | | |
415 | | - | |
| 415 | + | |
416 | 416 | | |
417 | | - | |
| 417 | + | |
418 | 418 | | |
419 | | - | |
420 | | - | |
421 | | - | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
422 | 422 | | |
423 | 423 | | |
424 | 424 | | |
| |||
437 | 437 | | |
438 | 438 | | |
439 | 439 | | |
440 | | - | |
| 440 | + | |
441 | 441 | | |
442 | | - | |
| 442 | + | |
443 | 443 | | |
444 | | - | |
445 | | - | |
446 | | - | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
447 | 447 | | |
448 | 448 | | |
449 | 449 | | |
| |||
462 | 462 | | |
463 | 463 | | |
464 | 464 | | |
465 | | - | |
| 465 | + | |
466 | 466 | | |
467 | | - | |
| 467 | + | |
468 | 468 | | |
469 | | - | |
470 | | - | |
471 | | - | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
472 | 472 | | |
473 | 473 | | |
474 | 474 | | |
| |||
490 | 490 | | |
491 | 491 | | |
492 | 492 | | |
493 | | - | |
| 493 | + | |
494 | 494 | | |
495 | 495 | | |
496 | | - | |
497 | | - | |
| 496 | + | |
| 497 | + | |
498 | 498 | | |
499 | 499 | | |
500 | 500 | | |
| |||
0 commit comments