Add AMD ROCm GPU build and test CI infrastructure#1
Conversation
- Add self-hosted runner test jobs (test-rocm-linux, test-rocm-windows) for gfx1151/gfx1150 - Add cleanup composite actions for Linux and Windows runners - Add runner heartbeat monitoring workflow - Configure ci/run.sh with ROCm environment (HIP_PLATFORM, LD_LIBRARY_PATH, cmake flags) - Add Windows ROCm build support to build.yml - Fix conditional expression syntax warnings in build.yml
… should_build outputs to be specific. I ahve removed outputs.rocm_version from both ci steps, extracted resolve_rocm to a shared script for both jobs to use them. Fixed the matrix, removed both ubuntu-rocm and windows-rocm FGGML_ROCM=1 flag which doesn't apply because it isn't a real flag. Also commented out heartbeat runners.
…[^<]*\)<\/Key>.*/\1/gp'. This works on both Linux and Windows Git Bash.
…, sed and grep didn't work.
…p on windows doesnt work very well.
6f435d0 to
cd3b5fc
Compare
ramkrishna2910
left a comment
There was a problem hiding this comment.
Code review from local testing — 2 bugs and 2 medium issues found. Build succeeds on gfx1151 (Radeon 8060S) with ROCm 7.1, inference works correctly.
|
@iswaryaalex @Geramy can we close on this? Been open for a while :D |
Yeah I'll have time this weekend |
- Fix alpha/RC version ordering bug in resolve-rocm-version.sh and build.yml
(alpha was incorrectly treated as newer than RC)
- Fix NULL check bug on ndim validation in ruby_whisper_context.c
(ndim check was incorrectly guarded by format != NULL)
- Add ${{ }} wrapper on if: conditionals at lines 615 and 1422 in build.yml
|
Action being ran to test latest changes. |
Replace duplicated ~55 lines of PowerShell version resolution logic in windows-rocm job with a call to ci/resolve-rocm-version.sh via Git Bash. This eliminates code duplication and ensures both Linux and Windows use the same version resolution logic.
Replace PCRE non-greedy .*? with ERE-compatible [^0-9]* in Bash regex patterns. Bash [[ =~ ]] uses POSIX ERE which does not support .*? non-greedy quantifier. On Windows Git Bash this fails strictly, leaving latest_file empty and causing 'Failed to extract ROCm version' error. Also adds: - File count validation with S3 response debug output - Empty latest_file check showing candidate files - Empty file line skip to prevent false regex matches
- Rewrite resolve-rocm-version.sh to use AMD's official tarball repo (repo.amd.com/rocm/tarball/) instead of scanning Amazon S3 - Remove 'latest' auto-detection logic which failed on Windows Git Bash due to PCRE vs ERE regex incompatibility - Add version format validation and clear error messages - Update build.yml workflow_dispatch to use concrete ROCm versions (7.12.0, 7.2.1) with choice options instead of 'latest'
Remove type:choice restriction so users can type any ROCm version while keeping 7.12.0 as default and linking to available versions
|
@copilot resolve the merge conflicts in this pull request |
…rocm into geramy/rocm-build-tests-matrix # Conflicts: # .github/workflows/build.yml # .github/workflows/runner_heartbeat.yml # ci/run.sh Co-authored-by: Geramy <264964+Geramy@users.noreply.github.com>
Merge conflicts have been resolved in commit cf3281c. Three files had conflicts: |
Geramy
left a comment
There was a problem hiding this comment.
I answered all comments under them already outside of the review.
|
PR request into upstream ggml-org#3757 |
Uh oh!
There was an error while loading. Please reload this page.