Skip to content

Commit 20acfc9

Browse files
Merge pull request #4 from ai-dock/arm64-support
Build arm64 tarball alongside amd64
2 parents 9d1bd0d + c4e4c96 commit 20acfc9

2 files changed

Lines changed: 52 additions & 18 deletions

File tree

.github/workflows/build-cuda.yml

Lines changed: 31 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -55,14 +55,24 @@ jobs:
5555
build:
5656
needs: check-release
5757
if: needs.check-release.outputs.should_build == 'true'
58-
runs-on: ubuntu-latest
58+
runs-on: ${{ matrix.arch.runs_on }}
5959
strategy:
60+
fail-fast: false
6061
matrix:
6162
cuda_version: ['12.8.1']
63+
arch:
64+
# Native runners — no QEMU. nvidia/cuda:*-cudnn-devel-ubuntu22.04 is
65+
# multi-arch on Docker Hub so the same build path runs on both.
66+
- { suffix: amd64, runs_on: ubuntu-latest }
67+
- { suffix: arm64, runs_on: ubuntu-24.04-arm }
6268
include:
6369
- cuda_version: '12.8.1'
6470
cuda_version_short: '12.8'
6571
cuda_tag: '12.8.1-cudnn-devel-ubuntu22.04'
72+
# CUDA compute capabilities target the runtime GPU, not the host
73+
# CPU arch, so the same list applies to both amd64 and arm64
74+
# builds. Relevant aarch64 GPU contexts (Grace Hopper, Grace
75+
# Blackwell, DGX Spark) are covered by sm_90 / sm_100 / sm_120.
6676
architectures: '75-virtual;80-virtual;86-virtual;89-virtual;90-virtual;100-virtual;120-virtual'
6777

6878
steps:
@@ -180,13 +190,15 @@ jobs:
180190
- name: Create tarball
181191
run: |
182192
cd binaries
183-
tar -czf llama.cpp-${{ needs.check-release.outputs.release_tag }}-cuda-${{ matrix.cuda_version_short }}.tar.gz cuda-${{ matrix.cuda_version_short }}
193+
tar -czf llama.cpp-${{ needs.check-release.outputs.release_tag }}-cuda-${{ matrix.cuda_version_short }}-${{ matrix.arch.suffix }}.tar.gz cuda-${{ matrix.cuda_version_short }}
184194
ls -lh *.tar.gz
185195
186196
- name: Upload artifact
187197
uses: actions/upload-artifact@v4
188198
with:
189-
name: llama.cpp-cuda-${{ matrix.cuda_version_short }}
199+
# Arch suffix in the artifact name so the matrix jobs do not collide
200+
# in actions/download-artifact later.
201+
name: llama.cpp-cuda-${{ matrix.cuda_version_short }}-${{ matrix.arch.suffix }}
190202
path: binaries/*.tar.gz
191203
retention-days: 1
192204

@@ -224,21 +236,29 @@ jobs:
224236
**Commit:** ${{ needs.check-release.outputs.release_hash }}
225237
226238
## CUDA Versions
227-
- CUDA 12.8 - Architectures: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
228-
229-
## Architecture Reference
239+
- CUDA 12.8 - GPU compute capabilities: 7.5, 8.0, 8.6, 8.9, 9.0, 10.0, 12.0
240+
241+
## Host architectures
242+
Tarballs are published per host CPU architecture (Linux):
243+
- `-amd64.tar.gz` — x86_64 (most desktops, servers, cloud VMs)
244+
- `-arm64.tar.gz` — aarch64 (Grace Hopper / Grace Blackwell / DGX Spark / Ampere Altra)
245+
246+
## GPU compute capability reference
230247
- 7.5: Tesla T4, RTX 20xx series, Quadro RTX
231248
- 8.0: A100
232249
- 8.6: RTX 3000 series
233250
- 8.9: RTX 4000 series, L4, L40
234-
- 9.0: H100, H200
235-
- 10.0: B200
251+
- 9.0: H100, H200, GH200
252+
- 10.0: B200, GB200
236253
- 12.0: RTX Pro series, RTX 50xx
237-
254+
238255
## Usage
239-
Download the appropriate tarball for your CUDA version and extract:
256+
Download the tarball matching your host CPU arch and CUDA version, then extract:
240257
```bash
241-
tar -xzf llama.cpp-${{ needs.check-release.outputs.release_tag }}-cuda-12.8.tar.gz
258+
# amd64 host
259+
tar -xzf llama.cpp-${{ needs.check-release.outputs.release_tag }}-cuda-12.8-amd64.tar.gz
260+
# arm64 host (e.g. Grace Blackwell)
261+
tar -xzf llama.cpp-${{ needs.check-release.outputs.release_tag }}-cuda-12.8-arm64.tar.gz
242262
./llama-cli --help
243263
```
244264
files: release-assets/*

README.md

Lines changed: 21 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -16,31 +16,45 @@ The official llama.cpp repository does not provide pre-built CUDA binaries. This
1616
### CUDA Versions
1717
- CUDA 12.8
1818

19+
### Host CPU Architectures
20+
21+
Each release publishes one tarball per host CPU architecture:
22+
23+
| Suffix | Linux platform | Typical hosts |
24+
|--------|----------------|---------------|
25+
| `-amd64` | x86_64 | Most desktops, servers, cloud VMs |
26+
| `-arm64` | aarch64 | Grace Hopper, Grace Blackwell, DGX Spark, Ampere Altra |
27+
28+
The CUDA compute capabilities below target the runtime GPU and are the same on both host architectures.
29+
1930
### GPU Architectures
2031

2132
| Compute Capability | GPU Examples |
22-
|-------------------|--------------|----------------|------------|
33+
|-------------------|--------------|
2334
| 6.1 | Titan XP, Tesla P40, GTX 10xx |
2435
| 7.0 | Tesla V100 |
2536
| 7.5 | Tesla T4, RTX 2000 series, Quadro RTX |
2637
| 8.0 | A100 |
2738
| 8.6 | RTX 3000 series |
2839
| 8.9 | RTX 4000 series, L4, L40 |
29-
| 9.0 | H100, H200 |
30-
| 10.0 | B200 |
40+
| 9.0 | H100, H200, GH200 |
41+
| 10.0 | B200, GB200 |
3142
| 12.0 | RTX Pro series, RTX 5000 series |
3243

3344
## Usage
3445

3546
### Download
3647

3748
1. Go to the [Releases](../../releases) page
38-
2. Download the tarball (e.g., `llama.cpp-bXXXX-cuda-12.8.tar.gz`)
49+
2. Download the tarball matching your host CPU architecture — `-amd64` for x86_64, `-arm64` for aarch64. Filename format: `llama.cpp-bXXXX-cuda-<cuda>-<arch>.tar.gz`
3950
3. Extract the archive:
4051

4152
```bash
42-
tar -xzf llama.cpp-bXXXX-cuda-12.8.tar.gz
43-
cd cuda-12.6
53+
# x86_64 host
54+
tar -xzf llama.cpp-bXXXX-cuda-12.8-amd64.tar.gz
55+
# aarch64 host (e.g. Grace Blackwell, DGX Spark)
56+
tar -xzf llama.cpp-bXXXX-cuda-12.8-arm64.tar.gz
57+
cd cuda-12.8
4458
```
4559

4660
### Run
@@ -73,7 +87,7 @@ cat VERSION.txt
7387
- NVIDIA GPU with compute capability 7.5 or higher
7488
- Appropriate NVIDIA driver for your CUDA version:
7589
- CUDA 12.8+: Driver >= 570.15
76-
- Linux x86_64 (Ubuntu 22.04 compatible)
90+
- Linux x86_64 or aarch64 (Ubuntu 22.04 compatible)
7791

7892
## Build Process
7993

0 commit comments

Comments
 (0)