Skip to content
Change the repository type filter

All

    Repositories list

    • tpu-inference

      Public
      TPU inference for vLLM, with unified JAX and PyTorch support.
      Python
      21142954Updated Nov 4, 2025Nov 4, 2025
    • vllm-spyre

      Public
      Community maintained hardware plugin for vLLM on Spyre
      Python
      2637616Updated Nov 4, 2025Nov 4, 2025
    • vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      11k62k1.9k1.2kUpdated Nov 4, 2025Nov 4, 2025
    • vllm-ascend

      Public
      Community maintained hardware plugin for vLLM on Ascend
      Python
      5331.3k646182Updated Nov 4, 2025Nov 4, 2025
    • compressed-tensors

      Public
      A safetensors extension to efficiently store sparse quantized tensors on disk
      Python
      36185510Updated Nov 4, 2025Nov 4, 2025
    • guidellm

      Public
      Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
      Python
      926778322Updated Nov 4, 2025Nov 4, 2025
    • aibrix

      Public
      Cost-efficient and pluggable Infrastructure components for GenAI inference
      Go
      4794.3k24119Updated Nov 4, 2025Nov 4, 2025
    • llm-compressor

      Public
      Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
      Python
      2762.2k5643Updated Nov 3, 2025Nov 3, 2025
    • recipes

      Public
      Common recipes to run vLLM
      Jupyter Notebook
      7120165Updated Nov 3, 2025Nov 3, 2025
    • speculators

      Public
      A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
      Python
      1162516Updated Nov 3, 2025Nov 3, 2025
    • semantic-router

      Public
      Intelligent Router for Mixture-of-Models
      Rust
      2762.1k9126Updated Nov 3, 2025Nov 3, 2025
    • ci-infra

      Public
      This repo hosts code for vLLM CI & Performance Benchmark infrastructure.
      HCL
      4424021Updated Nov 3, 2025Nov 3, 2025
    • vllm-gaudi

      Public
      Community maintained hardware plugin for vLLM on Intel Gaudi
      Python
      5915060Updated Nov 3, 2025Nov 3, 2025
    • vllm-neuron

      Public
      Community maintained hardware plugin for vLLM on AWS Neuron
      Python
      01100Updated Oct 31, 2025Oct 31, 2025
    • vllm-project.github.io

      Public
      JavaScript
      362202Updated Oct 31, 2025Oct 31, 2025
    • production-stack

      Public
      vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
      Python
      3121.9k8859Updated Oct 31, 2025Oct 31, 2025
    • vllm-xpu-kernels

      Public
      The vLLM XPU kernels for Intel GPU
      C++
      141105Updated Oct 31, 2025Oct 31, 2025
    • FlashMLA

      Public
      C++
      896703Updated Oct 22, 2025Oct 22, 2025
    • media-kit

      Public
      vLLM Logo Assets
      3600Updated Oct 22, 2025Oct 22, 2025
    • flash-attention

      Public
      Fast and memory-efficient exact attention
      Python
      2.1k97017Updated Oct 19, 2025Oct 19, 2025
    • DeepGEMM

      Public
      DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
      Cuda
      734000Updated Sep 29, 2025Sep 29, 2025
    • vllm-openvino

      Public
      Python
      72520Updated Aug 18, 2025Aug 18, 2025
    • rfcs

      Public
      0100Updated Jun 3, 2025Jun 3, 2025
    • vllm-project.github.io-static

      Public archive
      HTML
      7801Updated Feb 7, 2025Feb 7, 2025
    • vllm-nccl

      Public archive
      Manages vllm-nccl dependency
      Python
      31720Updated Jun 3, 2024Jun 3, 2024
    • dashboard

      Public
      vLLM performance dashboard
      Python
      73700Updated Apr 26, 2024Apr 26, 2024