Skip to content

lemonade-sdk/vllm-rocm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

vllm-rocm

GitHub release (latest by date) Latest release date License ROCm 7.0 Powered by vLLM Platform: Ubuntu

We provide portable builds of vLLM with AMD ROCm 7 acceleration based on TheRock. Each release is a self-contained archive containing a bundled Python environment, vLLM, PyTorch ROCm, and all required ROCm runtime libraries. Our automated pipeline targets integration with Lemonade.

Important

Early Development: This project is in active development. ROCm support for consumer AMD GPUs (RDNA) in vLLM is experimental. We welcome issue reports and contributions.

Supported Devices

GPU Target Architecture Devices
gfx1151 STX Halo APU Ryzen AI MAX+ Pro 395
gfx1150 STX Point APU Ryzen AI 300
gfx120X RDNA4 GPUs RX 9070 XT, RX 9070, RX 9060 XT, RX 9060
gfx110X RDNA3 GPUs RX 7900 XTX/XT/GRE, RX 7800 XT, RX 7700 XT, RX 7600 XT/7600

All builds include ROCm 7 runtime built-in — no separate ROCm installation required!

Quick Start

  1. Download the build for your GPU from the latest release
  2. Extract the archive:
    tar xzf vllm-b1000-ubuntu-rocm-gfx1151-x64.tar.gz -C ~/vllm-rocm
  3. Run the server:
    ~/vllm-rocm/bin/vllm-server --model meta-llama/Llama-3.2-1B --port 8000
  4. Test with curl:
    curl http://localhost:8000/v1/completions \
      -H "Content-Type: application/json" \
      -d '{"model": "meta-llama/Llama-3.2-1B", "prompt": "Hello", "max_tokens": 50}'

Lemonade Integration: These builds are designed to work as a backend for Lemonade, which manages downloading, launching, and routing requests to vLLM automatically.

What's Included

Each release archive contains a complete, portable environment:

bin/
  vllm-server     # Launcher script (entry point)
  python3.11      # Bundled Python interpreter
lib/
  libamdhip64.so  # ROCm runtime (HIP)
  librocblas.so   # ROCm BLAS
  libhipblas.so   # HIP BLAS
  ...             # All required ROCm shared libraries
  rocblas/library/ # rocBLAS kernel files
  python3.11/site-packages/
    vllm/          # vLLM package
    torch/         # PyTorch ROCm
    ...            # All Python dependencies

No external Python, PyTorch, or ROCm installation is needed.

Automated Builds

Our GitHub Actions workflow:

  • Downloads the latest ROCm 7 nightly from TheRock
  • Installs PyTorch ROCm from the official pip index
  • Builds vLLM from source with architecture-specific HIP kernels
  • Bundles everything with patchelf --set-rpath for portability
  • Tests on self-hosted AMD GPU hardware before releasing
GPU Target Ubuntu
gfx1151 Download
gfx1150 Download
gfx120X Download
gfx110X Download

Linux (gfx1150/APU): OOM despite free VRAM? Add ttm.pages_limit=12582912 (48 GB) to the kernel cmdline (e.g. GRUB), run update-grub, then reboot. See TheRock FAQ.

Dependencies

Runtime

  • vLLM — High-throughput LLM serving engine
  • PyTorch — Tensor computation framework (ROCm build)
  • ROCm (TheRock) — AMD GPU compute platform

Build (CI only)

  • Ubuntu 22.04 GitHub Actions runner
  • Python 3.11 from deadsnakes PPA
  • CMake, Ninja, patchelf

License

This project is licensed under the MIT License — see the LICENSE file for details.

About

Portable vLLM builds with AMD ROCm acceleration for Lemonade

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages