Skip to content
Change the repository type filter

All

    Repositories list

    • vllm-spyre

      Public
      Community maintained hardware plugin for vLLM on Spyre
      Python
      2837516Updated Nov 24, 2025Nov 24, 2025
    • TPU inference for vLLM, with unified JAX and PyTorch support.
      Python
      411661554Updated Nov 24, 2025Nov 24, 2025
    • vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      11k64k1.9k1.2kUpdated Nov 24, 2025Nov 24, 2025
    • semantic-router

      Public
      Intelligent Router for Mixture-of-Models
      Rust
      2972.3k11337Updated Nov 24, 2025Nov 24, 2025
    • vllm-gaudi

      Public
      Community maintained hardware plugin for vLLM on Intel Gaudi
      Python
      7117261Updated Nov 24, 2025Nov 24, 2025
    • guidellm

      Public
      Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
      Python
      1007184215Updated Nov 24, 2025Nov 24, 2025
    • llm-compressor

      Public
      Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
      Python
      2922.3k7144Updated Nov 24, 2025Nov 24, 2025
    • vllm-ascend

      Public
      Community maintained hardware plugin for vLLM on Ascend
      Python
      5961.4k726230Updated Nov 24, 2025Nov 24, 2025
    • vllm-xpu-kernels

      Public
      The vLLM XPU kernels for Intel GPU
      C++
      1411010Updated Nov 24, 2025Nov 24, 2025
    • aibrix

      Public
      Cost-efficient and pluggable Infrastructure components for GenAI inference
      Go
      4864.4k24924Updated Nov 24, 2025Nov 24, 2025
    • ci-infra

      Public
      This repo hosts code for vLLM CI & Performance Benchmark infrastructure.
      HCL
      4526023Updated Nov 23, 2025Nov 23, 2025
    • production-stack

      Public
      vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
      Python
      3262k9154Updated Nov 21, 2025Nov 21, 2025
    • speculators

      Public
      A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
      Python
      15128917Updated Nov 21, 2025Nov 21, 2025
    • Fast and memory-efficient exact attention
      Python
      2.2k102018Updated Nov 21, 2025Nov 21, 2025
    • vllm-project.github.io

      Public
      JavaScript
      432401Updated Nov 21, 2025Nov 21, 2025
    • compressed-tensors

      Public
      A safetensors extension to efficiently store sparse quantized tensors on disk
      Python
      39210514Updated Nov 19, 2025Nov 19, 2025
    • vllm-neuron

      Public
      Community maintained hardware plugin for vLLM on AWS Neuron
      Python
      01301Updated Nov 18, 2025Nov 18, 2025
    • recipes

      Public
      Common recipes to run vLLM
      Jupyter Notebook
      8624276Updated Nov 13, 2025Nov 13, 2025
    • FlashMLA

      Public
      C++
      905803Updated Oct 22, 2025Oct 22, 2025
    • media-kit

      Public
      vLLM Logo Assets
      3600Updated Oct 22, 2025Oct 22, 2025
    • DeepGEMM

      Public
      DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
      Cuda
      752000Updated Sep 29, 2025Sep 29, 2025
    • Python
      92630Updated Aug 18, 2025Aug 18, 2025
    • rfcs

      Public
      0100Updated Jun 3, 2025Jun 3, 2025
    • vllm-project.github.io-static

      Public archive
      HTML
      7901Updated Feb 7, 2025Feb 7, 2025
    • vllm-nccl

      Public archive
      Manages vllm-nccl dependency
      Python
      31720Updated Jun 3, 2024Jun 3, 2024
    • dashboard

      Public
      vLLM performance dashboard
      Python
      73800Updated Apr 26, 2024Apr 26, 2024