We provide portable builds of vLLM with AMD ROCm 7 acceleration based on TheRock. Each release is a self-contained archive containing a bundled Python environment, vLLM, PyTorch ROCm, and all required ROCm runtime libraries. Our automated pipeline targets integration with Lemonade.
Important
Early Development: This project is in active development. ROCm support for consumer AMD GPUs (RDNA) in vLLM is experimental. We welcome issue reports and contributions.
| GPU Target | Architecture | Devices |
|---|---|---|
| gfx1151 | STX Halo APU | Ryzen AI MAX+ Pro 395 |
| gfx1150 | STX Point APU | Ryzen AI 300 |
| gfx120X | RDNA4 GPUs | RX 9070 XT, RX 9070, RX 9060 XT, RX 9060 |
| gfx110X | RDNA3 GPUs | RX 7900 XTX/XT/GRE, RX 7800 XT, RX 7700 XT, RX 7600 XT/7600 |
All builds include ROCm 7 runtime built-in — no separate ROCm installation required!
- Download the build for your GPU from the latest release
- Extract the archive:
tar xzf vllm-b1000-ubuntu-rocm-gfx1151-x64.tar.gz -C ~/vllm-rocm - Run the server:
~/vllm-rocm/bin/vllm-server --model meta-llama/Llama-3.2-1B --port 8000 - Test with curl:
curl http://localhost:8000/v1/completions \ -H "Content-Type: application/json" \ -d '{"model": "meta-llama/Llama-3.2-1B", "prompt": "Hello", "max_tokens": 50}'
Lemonade Integration: These builds are designed to work as a backend for Lemonade, which manages downloading, launching, and routing requests to vLLM automatically.
Each release archive contains a complete, portable environment:
bin/
vllm-server # Launcher script (entry point)
python3.11 # Bundled Python interpreter
lib/
libamdhip64.so # ROCm runtime (HIP)
librocblas.so # ROCm BLAS
libhipblas.so # HIP BLAS
... # All required ROCm shared libraries
rocblas/library/ # rocBLAS kernel files
python3.11/site-packages/
vllm/ # vLLM package
torch/ # PyTorch ROCm
... # All Python dependencies
No external Python, PyTorch, or ROCm installation is needed.
Our GitHub Actions workflow:
- Downloads the latest ROCm 7 nightly from TheRock
- Installs PyTorch ROCm from the official pip index
- Builds vLLM from source with architecture-specific HIP kernels
- Bundles everything with
patchelf --set-rpathfor portability - Tests on self-hosted AMD GPU hardware before releasing
| GPU Target | Ubuntu |
|---|---|
| gfx1151 | |
| gfx1150 | |
| gfx120X | |
| gfx110X |
Linux (gfx1150/APU): OOM despite free VRAM? Add
ttm.pages_limit=12582912(48 GB) to the kernel cmdline (e.g. GRUB), runupdate-grub, then reboot. See TheRock FAQ.
- vLLM — High-throughput LLM serving engine
- PyTorch — Tensor computation framework (ROCm build)
- ROCm (TheRock) — AMD GPU compute platform
- Ubuntu 22.04 GitHub Actions runner
- Python 3.11 from deadsnakes PPA
- CMake, Ninja, patchelf
This project is licensed under the MIT License — see the LICENSE file for details.