A vLLM component that provides optimized custom kernels for Intel GPUs (XPU) to accelerate LLM inference.
vLLM defines and implements many custom Torch ops and kernels. This repository provides custom implementations for the Intel XPU (GPU) backend, enabling high-throughput LLM inference on Intel hardware.
Kernels are written in SYCL/DPC++ and leverage oneDNN for deep learning primitives. The library follows the PyTorch custom op registration and dispatch pattern — importing it at startup registers all ops for seamless use within vLLM.
| Category | Operations |
|---|---|
| Normalization | RMS norm, fused add-RMS norm, layer norm |
| Activation | SiLU-and-mul, mul-and-SiLU, GeLU (fast/new/quick/tanh), SwigluOAI |
| Attention | Flash attention (variable-length), GDN attention, XE2 attention variants |
| Positional Encoding | Rotary embedding (NeoX and GPT-J styles), DeepSeek scaling RoPE |
| Mixture of Experts | TopK scoring (softmax/sigmoid), grouped TopK, fused grouped TopK; MoE align sum, MoE gather, expert remapping |
| LoRA | LoRA operator support |
| Quantization | FP8, MxFP4 quantization and GEMM |
| GEMM | Grouped GEMM |
| Misc | TopK per row, memory utilities |
- Python: 3.9 – 3.12
- PyTorch: 2.10.0+xpu
- oneAPI: 2025.3 (Base Toolkit download)
- CMake: ≥ 3.26
- Ninja build system
vLLM calls import vllm_xpu_kernels._C at startup, which registers all custom ops into the PyTorch dispatcher. From that point on, XPU ops are dispatched automatically whenever vLLM runs on Intel GPU hardware — no additional code changes are required in vLLM itself.
1. Install oneAPI 2025.3
Download and install the Intel oneAPI Base Toolkit, then source the environment:
source /opt/intel/oneapi/setvars.sh2. Create a virtual environment and install dependencies
python -m venv .venv
source .venv/bin/activate
git clone https://github.com/vllm-project/vllm-xpu-kernels
cd vllm-xpu-kernels
pip install -r requirements.txtDevelopment install (editable, source in current directory):
pip install --extra-index-url=https://download.pytorch.org/whl/xpu -e . -v
# Faster: skip build isolation if dependencies are already present
pip install --no-build-isolation -e . -vStandard install (to site-packages):
pip install --extra-index-url=https://download.pytorch.org/whl/xpu .
# or
pip install --no-build-isolation .Build a wheel (output goes to dist/):
pip wheel --extra-index-url=https://download.pytorch.org/whl/xpu .
# or
pip wheel --no-build-isolation .Incremental rebuild (fastest for iterative development):
python -m build --wheel --no-isolationAfter vLLM RFC#33214 was completed, vLLM-XPU migrated to a vllm-xpu-kernels-based implementation. Installing the latest vLLM for XPU will pull in vllm-xpu-kernels automatically as a wheel dependency — no manual integration is required.
Run the full test suite with pytest:
pytest tests/Individual test modules cover activations, cache operations, attention, MoE, LoRA, quantization, and memory utilities. See the tests/ directory for the complete list.
Benchmark scripts for individual kernels are in the benchmark/ directory:
python benchmark/benchmark_layernorm.py
python benchmark/benchmark_lora.py
python benchmark/benchmark_grouped_topk.py
# etc.This project is licensed under the Apache License 2.0. See the LICENSE file for details.