vllm-project repositories

vllm-project.github.io

Public

JavaScript

•51•25•0•2•Updated

Dec 15, 2025

vllm-ascend

Public

Community maintained hardware plugin for vLLM on Ascend

inference transformer model-servingmlops ascend llm llmops llm-serving vllm

Python

•

Apache License 2.0

•659•1.5k•815•280•Updated

Dec 15, 2025

vllm

Public

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda inferencepytorch transformer openai moe llama gpt model-serving

Python

•

Apache License 2.0

•12k•65k•1.9k•1.3k•Updated

Dec 15, 2025

vllm-omni

Public

A framework for efficient model inference with omni-modality models

inference pytorch transformerimage-generation diffusion model-serving multimodal video-generation audio-generation

Python

•

Apache License 2.0

•115•902•60•34•Updated

Dec 15, 2025

vllm-xpu-kernels

Public

The vLLM XPU kernels for Intel GPU

C++

•

Apache License 2.0

•15•12•1•4•Updated

Dec 15, 2025

semantic-router

Public

Intelligent Router for Mixture-of-Models

python kubernetes rustgolang mcp fine-tuning envoyproxy pii-detection mixture-of-models huggingface-transformers

Go

•

Apache License 2.0

•307•2.4k•94•34•Updated

Dec 15, 2025

tpu-inference

Public

TPU inference for vLLM, with unified JAX and PyTorch support.

Python

•

Apache License 2.0

•59•194•17•75•Updated

Dec 15, 2025

vllm-gaudi

Public

Community maintained hardware plugin for vLLM on Intel Gaudi

Python

•

Apache License 2.0

•80•19•1•66•Updated

Dec 15, 2025

llm-compressor

Public

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

sparsity compression quantization

Python

•

Apache License 2.0

•316•2.4k•74•50•Updated

Dec 15, 2025

guidellm

Public

Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs

Python

•

Apache License 2.0

•107•753•44•16•Updated

Dec 15, 2025

compressed-tensors

Public

A safetensors extension to efficiently store sparse quantized tensors on disk

Python

•

Apache License 2.0

•45•215•3•14•Updated

Dec 14, 2025

ci-infra

Public

This repo hosts code for vLLM CI & Performance Benchmark infrastructure.

HCL

•

Apache License 2.0

•50•27•0•29•Updated

Dec 14, 2025

vLLM-in-PyTorch-Conference-2025

Public

0•7•0•0•Updated

Dec 14, 2025

router

Public

A high-performance and light-weight router for vLLM large scale deployment

Rust

•

Apache License 2.0

•5•25•1•5•Updated

Dec 14, 2025

aibrix

Public

Cost-efficient and pluggable Infrastructure components for GenAI inference

Go

•

Apache License 2.0

•495•4.5k•265•28•Updated

Dec 13, 2025

vllm-spyre

Public

Community maintained hardware plugin for vLLM on Spyre

Python

•

Apache License 2.0

•30•37•4•12•Updated

Dec 12, 2025

vllm-metal

Public

Community maintained hardware plugin for vLLM on Apple Silicon

Apache License 2.0

•0•1•1•0•Updated

Dec 12, 2025

recipes

Public

Common recipes to run vLLM

Jupyter Notebook

•

Apache License 2.0

•102•278•10•26•Updated

Dec 12, 2025

speculators

Public

A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM

Python

•

Apache License 2.0

•21•156•8•10•Updated

Dec 11, 2025

flash-attention

Public

Fast and memory-efficient exact attention

Python

•

BSD 3-Clause "New" or "Revised" License

•2.2k•104•0•16•Updated

Dec 11, 2025

vllm-neuron

Public

Community maintained hardware plugin for vLLM on AWS Neuron

Python

•3•15•0•1•Updated

Dec 6, 2025

vllm-openvino

Public

Python

•

Apache License 2.0

•10•27•2•0•Updated

Dec 4, 2025

production-stack

Public

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Python

•

Apache License 2.0

•337•2k•93•54•Updated

Nov 30, 2025

FlashMLA

Public

C++

•

MIT License

•914•9•0•3•Updated

Oct 22, 2025

media-kit

Public

vLLM Logo Assets

4•6•0•0•Updated

Oct 22, 2025

DeepGEMM

Public

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda

•

MIT License

•774•0•0•0•Updated

Sep 29, 2025

rfcs

Public

0•1•0•0•Updated

Jun 3, 2025

vllm-project.github.io-static

Public archive

HTML

•

MIT License

•7•9•0•1•Updated

Feb 7, 2025

vllm-nccl

Public archive

Manages vllm-nccl dependency

Python

•

Apache License 2.0

•3•17•2•0•Updated

Jun 3, 2024

dashboard

Public

vLLM performance dashboard

Python

•

Apache License 2.0

•8•40•0•0•Updated

Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM

All

All

30 repositories

vllm-project.github.io

vllm-ascend

vllm

vllm-omni

vllm-xpu-kernels

semantic-router

tpu-inference

vllm-gaudi

llm-compressor

guidellm

compressed-tensors

ci-infra

vLLM-in-PyTorch-Conference-2025

router

aibrix

vllm-spyre

vllm-metal

recipes

speculators

flash-attention

vllm-neuron

vllm-openvino

production-stack

FlashMLA

media-kit

DeepGEMM

rfcs

vllm-project.github.io-static

vllm-nccl

dashboard

All

All

Repositories list

30 repositories