Commit d132f22
authored
HIP: add CDNA4 (gfx950) architecture support for MI350X/MI355X (ggml-org#21570)
Add AMD Instinct MI350X/MI355X (gfx950, CDNA4) support:
- vendors/hip.h: Add CDNA4 preprocessor define for __gfx950__
- common.cuh: Add GGML_CUDA_CC_CDNA4 and GGML_CUDA_CC_IS_CDNA4 macros
- mma.cuh: Route CDNA4 to compatible MFMA instructions:
* f32 matmul: mfma_f32_16x16x4f32 (xf32 variant unavailable on gfx950)
* bf16 matmul: mfma_f32_16x16x16bf16_1k (same as CDNA3)
* int8 matmul: mfma_i32_16x16x32_i8/32x32x16 (same as CDNA3)
- mmq.cuh: Include CDNA4 in stream-k kernel dispatch
CDNA4 is largely compatible with CDNA3 except:
- No xf32 MFMA (mfma_f32_16x16x8_xf32) — routes to f32 path
- Different FP8 format (e4m3fn vs e4m3_fnuz) — not changed here
Tested on AMD Instinct MI355X (gfx950), ROCm 7.0.1:
- Build: compiles cleanly with -DAMDGPU_TARGETS=gfx950
- llama-bench (Qwen2.5-1.5B Q4_K_M, single GPU):
* f16+FA: 40,013 tok/s prefill, 254 tok/s decode
* q8_0+FA: functional
- Flash attention: works correctly
- MMQ: works correctly with stream-k dispatch
Co-authored-by: Andy Luo <andyluo7@users.noreply.github.com>1 parent d6f3030 commit d132f22
4 files changed
Lines changed: 19 additions & 12 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
67 | 67 | | |
68 | 68 | | |
69 | 69 | | |
| 70 | + | |
70 | 71 | | |
71 | 72 | | |
72 | 73 | | |
| |||
87 | 88 | | |
88 | 89 | | |
89 | 90 | | |
90 | | - | |
| 91 | + | |
| 92 | + | |
91 | 93 | | |
92 | 94 | | |
93 | 95 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1025 | 1025 | | |
1026 | 1026 | | |
1027 | 1027 | | |
1028 | | - | |
| 1028 | + | |
| 1029 | + | |
1029 | 1030 | | |
1030 | 1031 | | |
1031 | 1032 | | |
| |||
1187 | 1188 | | |
1188 | 1189 | | |
1189 | 1190 | | |
1190 | | - | |
| 1191 | + | |
1191 | 1192 | | |
1192 | 1193 | | |
1193 | 1194 | | |
| |||
1216 | 1217 | | |
1217 | 1218 | | |
1218 | 1219 | | |
1219 | | - | |
| 1220 | + | |
1220 | 1221 | | |
1221 | 1222 | | |
1222 | 1223 | | |
1223 | 1224 | | |
1224 | | - | |
| 1225 | + | |
1225 | 1226 | | |
1226 | 1227 | | |
1227 | 1228 | | |
| |||
1230 | 1231 | | |
1231 | 1232 | | |
1232 | 1233 | | |
1233 | | - | |
| 1234 | + | |
1234 | 1235 | | |
1235 | 1236 | | |
1236 | 1237 | | |
| |||
1295 | 1296 | | |
1296 | 1297 | | |
1297 | 1298 | | |
1298 | | - | |
| 1299 | + | |
1299 | 1300 | | |
1300 | 1301 | | |
1301 | 1302 | | |
1302 | 1303 | | |
1303 | | - | |
| 1304 | + | |
1304 | 1305 | | |
1305 | 1306 | | |
1306 | 1307 | | |
| |||
1309 | 1310 | | |
1310 | 1311 | | |
1311 | 1312 | | |
1312 | | - | |
| 1313 | + | |
1313 | 1314 | | |
1314 | 1315 | | |
1315 | 1316 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3645 | 3645 | | |
3646 | 3646 | | |
3647 | 3647 | | |
3648 | | - | |
| 3648 | + | |
3649 | 3649 | | |
3650 | 3650 | | |
3651 | 3651 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
189 | 189 | | |
190 | 190 | | |
191 | 191 | | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
192 | 196 | | |
193 | 197 | | |
194 | 198 | | |
| |||
201 | 205 | | |
202 | 206 | | |
203 | 207 | | |
204 | | - | |
| 208 | + | |
205 | 209 | | |
206 | | - | |
| 210 | + | |
207 | 211 | | |
208 | 212 | | |
209 | 213 | | |
| |||
0 commit comments