vllm-project repositories

tpu-inference

Public

TPU inference for vLLM, with unified JAX and PyTorch support.

Python

•

Apache License 2.0

•21•142•9•54•Updated

Nov 4, 2025

vllm-spyre

Public

Community maintained hardware plugin for vLLM on Spyre

Python

•

Apache License 2.0

•26•37•6•16•Updated

Nov 4, 2025

vllm

Public

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda inferencepytorch transformer openai moe llama gpt model-serving

Python

•

Apache License 2.0

•11k•62k•1.9k•1.2k•Updated

Nov 4, 2025

vllm-ascend

Public

Community maintained hardware plugin for vLLM on Ascend

inference transformer model-servingmlops ascend llm llmops llm-serving vllm

Python

•

Apache License 2.0

•533•1.3k•646•182•Updated

Nov 4, 2025

compressed-tensors

Public

A safetensors extension to efficiently store sparse quantized tensors on disk

Python

•

Apache License 2.0

•36•185•5•10•Updated

Nov 4, 2025

guidellm

Public

Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs

Python

•

Apache License 2.0

•92•677•83•22•Updated

Nov 4, 2025

aibrix

Public

Cost-efficient and pluggable Infrastructure components for GenAI inference

Go

•

Apache License 2.0

•479•4.3k•241•19•Updated

Nov 4, 2025

llm-compressor

Public

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

sparsity compression quantization

Python

•

Apache License 2.0

•276•2.2k•56•43•Updated

Nov 3, 2025

recipes

Public

Common recipes to run vLLM

Jupyter Notebook

•

Apache License 2.0

•71•201•6•5•Updated

Nov 3, 2025

speculators

Public

A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM

Python

•

Apache License 2.0

•11•62•5•16•Updated

Nov 3, 2025

semantic-router

Public

Intelligent Router for Mixture-of-Models

python kubernetes rustgolang mcp fine-tuning envoyproxy pii-detection mixture-of-models huggingface-transformers

Rust

•

Apache License 2.0

•276•2.1k•91•26•Updated

Nov 3, 2025

ci-infra

Public

This repo hosts code for vLLM CI & Performance Benchmark infrastructure.

HCL

•

Apache License 2.0

•44•24•0•21•Updated

Nov 3, 2025

vllm-gaudi

Public

Community maintained hardware plugin for vLLM on Intel Gaudi

Python

•

Apache License 2.0

•59•15•0•60•Updated

Nov 3, 2025

vllm-neuron

Public

Community maintained hardware plugin for vLLM on AWS Neuron

Python

•

Apache License 2.0

•0•11•0•0•Updated

Oct 31, 2025

vllm-project.github.io

Public

JavaScript

•36•22•0•2•Updated

Oct 31, 2025

production-stack

Public

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Python

•

Apache License 2.0

•312•1.9k•88•59•Updated

Oct 31, 2025

vllm-xpu-kernels

Public

The vLLM XPU kernels for Intel GPU

C++

•

Apache License 2.0

•14•11•0•5•Updated

Oct 31, 2025

FlashMLA

Public

C++

•

MIT License

•896•7•0•3•Updated

Oct 22, 2025

media-kit

Public

vLLM Logo Assets

3•6•0•0•Updated

Oct 22, 2025

flash-attention

Public

Fast and memory-efficient exact attention

Python

•

BSD 3-Clause "New" or "Revised" License

•2.1k•97•0•17•Updated

Oct 19, 2025

DeepGEMM

Public

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda

•

MIT License

•734•0•0•0•Updated

Sep 29, 2025

vllm-openvino

Public

Python

•

Apache License 2.0

•7•25•2•0•Updated

Aug 18, 2025

rfcs

Public

0•1•0•0•Updated

Jun 3, 2025

vllm-project.github.io-static

Public archive

HTML

•

MIT License

•7•8•0•1•Updated

Feb 7, 2025

vllm-nccl

Public archive

Manages vllm-nccl dependency

Python

•

Apache License 2.0

•3•17•2•0•Updated

Jun 3, 2024

dashboard

Public

vLLM performance dashboard

Python

•

Apache License 2.0

•7•37•0•0•Updated

Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM

All

All

26 repositories

tpu-inference

vllm-spyre

vllm

vllm-ascend

compressed-tensors

guidellm

aibrix

llm-compressor

recipes

speculators

semantic-router

ci-infra

vllm-gaudi

vllm-neuron

vllm-project.github.io

production-stack

vllm-xpu-kernels

FlashMLA

media-kit

flash-attention

DeepGEMM

vllm-openvino

rfcs

vllm-project.github.io-static

vllm-nccl

dashboard

All

All

Repositories list

26 repositories