llm-d Accelerators

llm-d supports multiple accelerator vendors and we are expanding our coverage.

Support

Maintainers for each accelerator type are listed below. See our well-lit path guides for details of deploying on each hardware type.

Vendor	Models	Maintainers	Supported Well-lit Paths
AMD	ROCm	Kenny Roche (Kenny.Roche@amd.com)	Coming soon
Google	TPU	Edwin Hernandez (@Edwinhr716), Cong Liu (@liu-cong, congliu.thu@gmail.com)	Inference Scheduling, Prefill/Decode Disaggregation
Intel	XPU	Yuan Wu (@yuanwu2017, yuan.wu@intel.com)	Inference Scheduling, Prefill/Decode Disaggregation
NVIDIA	GPU	Will Eaton (weaton@redhat.com), Greg (grpereir@redhat.com)	All

Requirements

We welcome contributions from accelerator vendors. To be referenced as a supported hardware vendor we require at minimum a publicly available container image that launches vLLM in the recommended configuration.

For integration into the well-lit paths our standard for contribution is higher, requiring:

A named maintainer responsible for keeping guide contents up to date
Manual or automated verification of the guide deployment for each release

Note

We aim to increase our requirements to have active CI coverage for all hardware guide variants in a future release.

[!NOTE] The community can assist but is not responsible for keeping hardware guide variants updated. We reserve the right to remove stale examples and documentation with regard to hardware support.

Intel XPU

Intel accelerators are supported via the well-lit paths (see the Intel row in the table above). For cluster prerequisites and image expectations, see the infrastructure prereq.

Accelerator Resource Management

To enable llm-d accelerators to access hardware devices, the devices must be exposed to containers. Kubernetes provides two mechanisms to accomplish this:

Typically, clusters use one mechanism or the other to expose accelerator devices. While it's possible to use both mechanisms simultaneously, this requires special configuration not covered in this document.

Device Plugins

Each vendor provides Device Plugins for their accelerators. The following plugins are available by vendor:

AMD ROCm Device Plugin
Google TPU Device Plugin (automatically enabled for TPU instances)
Intel XPU Device Plugin
Intel Gaudi Device Plugin
NVIDIA GPU Device Plugin

Dynamic Resource Allocation

Each vendor provides DRA resource drivers for their accelerators. The following drivers and setup documentation are available by vendor:

Since DRA is a newer Kubernetes feature, some feature gates may be required. Consult your vendor and cluster provider documentation for specific requirements.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-d Accelerators

Support

Requirements

Intel XPU

Accelerator Resource Management

Device Plugins

Dynamic Resource Allocation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

llm-d Accelerators

Support

Requirements

Intel XPU

Accelerator Resource Management

Device Plugins

Dynamic Resource Allocation