vllm-xpu-kernels

A vLLM component that provides optimized custom kernels for Intel GPUs (XPU) to accelerate LLM inference.

About

vLLM defines and implements many custom Torch ops and kernels. This repository provides custom implementations for the Intel XPU (GPU) backend, enabling high-throughput LLM inference on Intel hardware.

Kernels are written in SYCL/DPC++ and leverage oneDNN for deep learning primitives. The library follows the PyTorch custom op registration and dispatch pattern — importing it at startup registers all ops for seamless use within vLLM.

Supported Kernels

Category	Operations
Normalization	RMS norm, fused add-RMS norm, layer norm
Activation	SiLU-and-mul, mul-and-SiLU, GeLU (fast/new/quick/tanh), SwigluOAI
Attention	Flash attention (variable-length), GDN attention, XE2 attention variants
Positional Encoding	Rotary embedding (NeoX and GPT-J styles), DeepSeek scaling RoPE
Mixture of Experts	TopK scoring (softmax/sigmoid), grouped TopK, fused grouped TopK; MoE align sum, MoE gather, expert remapping
LoRA	LoRA operator support
Quantization	FP8, MxFP4 quantization and GEMM
GEMM	Grouped GEMM
Misc	TopK per row, memory utilities

Requirements

Python: 3.9 – 3.12
PyTorch: 2.12.0+xpu
oneAPI: 2025.3 (Base Toolkit download)
CMake: ≥ 3.26
Ninja build system

Getting Started

How It Works

vLLM calls import vllm_xpu_kernels._C at startup, which registers all custom ops into the PyTorch dispatcher. From that point on, XPU ops are dispatched automatically whenever vLLM runs on Intel GPU hardware — no additional code changes are required in vLLM itself.

Installation

1. Install oneAPI 2025.3

Download and install the Intel oneAPI Base Toolkit, then source the environment:

source /opt/intel/oneapi/setvars.sh

2. Create a virtual environment and install dependencies

python -m venv .venv
source .venv/bin/activate

git clone https://github.com/vllm-project/vllm-xpu-kernels
cd vllm-xpu-kernels

pip install -r requirements.txt

Build Options

Development install (editable, source in current directory):

pip install --extra-index-url=https://download.pytorch.org/whl/xpu -e . -v
# Faster: skip build isolation if dependencies are already present
pip install --no-build-isolation -e . -v

Standard install (to site-packages):

pip install --extra-index-url=https://download.pytorch.org/whl/xpu .
# or
pip install --no-build-isolation .

Build a wheel (output goes to dist/):

pip wheel --extra-index-url=https://download.pytorch.org/whl/xpu .
# or
pip wheel --no-build-isolation .

Incremental rebuild (fastest for iterative development):

python -m build --wheel --no-isolation

Using with vLLM

After vLLM RFC#33214 was completed, vLLM-XPU migrated to a vllm-xpu-kernels-based implementation. Installing the latest vLLM for XPU will pull in vllm-xpu-kernels automatically as a wheel dependency — no manual integration is required.

Kernel Configuration

By default, vLLM-XPU compiles kernels for common models (Llama, Qwen, DeepSeek). For customization:

VLLM_CHUNK_PREFILL_CONFIG=chunk_prefill_full.conf VLLM_PAGED_DECODE_CONFIG=paged_decode_full.conf pip install .

See KERNEL_CONFIGURATION.md for detailed guidance on kernel configuration, presets, and troubleshooting missing kernels.

Testing

Run the full test suite with pytest:

pytest tests/

Individual test modules cover activations, cache operations, attention, MoE, LoRA, quantization, and memory utilities. See the tests/ directory for the complete list.

Benchmarks

Benchmark scripts for individual kernels are in the benchmark/ directory:

python benchmark/benchmark_layernorm.py
python benchmark/benchmark_lora.py
python benchmark/benchmark_grouped_topk.py
# etc.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 280 Commits
.github		.github
benchmark		benchmark
build_script		build_script
cmake		cmake
csrc		csrc
docs		docs
tests		tests
third_party		third_party
tools		tools
vllm_xpu_kernels		vllm_xpu_kernels
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CMakeLists.txt		CMakeLists.txt
Dockerfile.xpu		Dockerfile.xpu
KERNEL_CONFIGURATION.md		KERNEL_CONFIGURATION.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vllm-xpu-kernels

Table of Contents

About

Supported Kernels

Requirements

Getting Started

How It Works

Installation

Build Options

Using with vLLM

Kernel Configuration

Testing

Benchmarks

License

About

Uh oh!

Releases 14

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vllm-xpu-kernels

Table of Contents

About

Supported Kernels

Requirements

Getting Started

How It Works

Installation

Build Options

Using with vLLM

Kernel Configuration

Testing

Benchmarks

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages