vllm-rocm

We provide portable builds of vLLM with AMD ROCm 7 acceleration based on TheRock. Each release is a self-contained archive containing a bundled Python environment, vLLM, PyTorch ROCm, and all required ROCm runtime libraries. Our automated pipeline targets integration with Lemonade.

Important

Early Development: This project is in active development. ROCm support for consumer AMD GPUs (RDNA) in vLLM is experimental. We welcome issue reports and contributions.

Supported Devices

GPU Target	Architecture	Devices
gfx1151	STX Halo APU	Ryzen AI MAX+ Pro 395
gfx1150	STX Point APU	Ryzen AI 300
gfx120X	RDNA4 GPUs	RX 9070 XT, RX 9070, RX 9060 XT, RX 9060
gfx110X	RDNA3 GPUs	RX 7900 XTX/XT/GRE, RX 7800 XT, RX 7700 XT, RX 7600 XT/7600

All builds include ROCm 7 runtime built-in — no separate ROCm installation required!

Quick Start

Download the build for your GPU from the latest release

Extract the archive:

tar xzf vllm-b1000-ubuntu-rocm-gfx1151-x64.tar.gz -C ~/vllm-rocm

Run the server:

~/vllm-rocm/bin/vllm-server --model meta-llama/Llama-3.2-1B --port 8000

Test with curl:

curl http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "meta-llama/Llama-3.2-1B", "prompt": "Hello", "max_tokens": 50}'

Lemonade Integration: These builds are designed to work as a backend for Lemonade, which manages downloading, launching, and routing requests to vLLM automatically.

What's Included

Each release archive contains a complete, portable environment:

bin/
  vllm-server     # Launcher script (entry point)
  python3.11      # Bundled Python interpreter
lib/
  libamdhip64.so  # ROCm runtime (HIP)
  librocblas.so   # ROCm BLAS
  libhipblas.so   # HIP BLAS
  ...             # All required ROCm shared libraries
  rocblas/library/ # rocBLAS kernel files
  python3.11/site-packages/
    vllm/          # vLLM package
    torch/         # PyTorch ROCm
    ...            # All Python dependencies

No external Python, PyTorch, or ROCm installation is needed.

Automated Builds

Our GitHub Actions workflow:

Downloads the latest ROCm 7 nightly from TheRock
Installs PyTorch ROCm from the official pip index
Builds vLLM from source with architecture-specific HIP kernels
Bundles everything with patchelf --set-rpath for portability
Tests on self-hosted AMD GPU hardware before releasing

GPU Target	Ubuntu
gfx1151
gfx1150
gfx120X
gfx110X

Linux (gfx1150/APU): OOM despite free VRAM? Add ttm.pages_limit=12582912 (48 GB) to the kernel cmdline (e.g. GRUB), run update-grub, then reboot. See TheRock FAQ.

Dependencies

Runtime

vLLM — High-throughput LLM serving engine
PyTorch — Tensor computation framework (ROCm build)
ROCm (TheRock) — AMD GPU compute platform

Build (CI only)

Ubuntu 22.04 GitHub Actions runner
Python 3.11 from deadsnakes PPA
CMake, Ninja, patchelf

License

This project is licensed under the MIT License — see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github		.github
scripts		scripts
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vllm-rocm

Supported Devices

Quick Start

What's Included

Automated Builds

Dependencies

Runtime

Build (CI only)

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vllm-rocm

Supported Devices

Quick Start

What's Included

Automated Builds

Dependencies

Runtime

Build (CI only)

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages