fix(gpu): support wildcards in GPU detection logic#2295
Conversation
fl0rianr
left a comment
There was a problem hiding this comment.
Thanks, this makes sense to me. Good work
It would be nice to add a tiny regression test for:
- gfx1103 -> gfx110X
- gfx1201 -> gfx120X
- exact matches like gfx1151 or gfx1152
- non-matches like gfx1151 vs gfx110X
No blocking - but helpful.
|
I have the same problem on my rx 6600 (gfx1032) |
|
PR fixes my GFX1100 |
|
Not sure if this is relevant but there's a gfx110x build of whisper.cpp at https://github.com/lemonade-sdk/whisper.cpp-rocm/releases but lemond is looking for a gfx1100 build |
Good idea. Added. |
ed518d6 to
2e91ae1
Compare
The identify_rocm_arch_from_name() function converts KFD gfx_target_version values (e.g. 110003, 120001) into specific family strings like gfx1103 and gfx1201. However, the RECIPE_DEFS table uses X-suffixed wildcards (gfx110X, gfx120X) to represent entire architecture families. device_matches_constraint() did an exact string comparison, so gfx1103 != gfx110X and gfx1201 != gfx120X, causing valid AMD GPUs (RDNA3/RDNA4) detected via KFD sysfs to be reported as "Unsupported GPU" for ROCm backends. Fix device_matches_constraint() to treat a trailing X in allowed family strings as a prefix wildcard match. Co-authored-by: Big Pickle <big-pickle@opencode.ai>
This adds a unit test to verify the `device_matches_constraint` logic. It ensures that families with a trailing 'X' (e.g., "gfx110X") are correctly recognized as wildcards that match specific models (e.g., "gfx1103"). Co-authored-by: opencode:Gemma-4-12B-it-GGUF
2e91ae1 to
78b6984
Compare
|
Sort of added anyway... I had an LLM cook up this test, but all it does is replicate the C++ logic in python and run it. If someone breaks the C++ code this won't catch it. I'm not clear on how we'd add a testcase here without adding some sort of LD_PRELOAD shim or something that spoofs fake GPUs for lemond to detect. Thoughts? |
|
Thanks for adding the regression coverage. I agree this kind of Python replica test is not a perfect implementation-level test - not a real regression test at all. But it can test the idea of a change in Code. So this can help human and LLM alike if it's changes accordingly an the python code ist considered as well. At the moment we have not all those cards in the CI (even no Nvidia at all) and it is consistent with the existing CUDA arch mapping test style and is useful as lightweight "expected-behavior" coverage. I would not go for a more complex test at this point. The actual code change is small and directly fixes the |
fl0rianr
left a comment
There was a problem hiding this comment.
It's fine, no need to change, it's our job getting CI back on track with this MacOS whisper job failing.
|
👋 I closed my duplicate (#2324) in favor of this — the wildcard is the cleaner approach. One small forward-compat note from comparing the two, take it or leave it: The wildcard makes the support-set match ( It works for every GPU today (all current RDNA2/3/4 arches are enumerated in |
|
For repo hygiene — this PR looks like it resolves a cluster of three open issues reporting the same root cause (ROCm backends marked unsupported because the detected gfx arch no longer matches the family allowlist after
They're already cross-linked to each other (#2296 ↔ #2302 ↔ #2319); this PR is the fix none of them point to yet. I closed my own duplicate attempt (#2324) in favor of this approach. Happy to help verify — I reproduced #2319 on an RX 7900 XT (gfx1100) and confirmed family matching restores ROCm detection. |
This patch fixes ROCm detection for me for the GPUs covered by wildcard strings.
The identify_rocm_arch_from_name() function converts KFD gfx_target_version values (e.g. 110003, 120001) into specific family strings like gfx1103 and gfx1201. However, the RECIPE_DEFS table uses X-suffixed wildcards (gfx110X, gfx120X) to represent entire architecture families.
device_matches_constraint() did an exact string comparison, so gfx1103 != gfx110X and gfx1201 != gfx120X, causing valid AMD GPUs (RDNA3/RDNA4) detected via KFD sysfs to be reported as "Unsupported GPU" for ROCm backends.
Fix device_matches_constraint() to treat a trailing X in allowed family strings as a prefix wildcard match.