Skip to content
Change the repository type filter

All

    Repositories list

    • vllm-project.github.io

      Public
      JavaScript
      512502Updated Dec 15, 2025Dec 15, 2025
    • vllm-ascend

      Public
      Community maintained hardware plugin for vLLM on Ascend
      Python
      6591.5k815280Updated Dec 15, 2025Dec 15, 2025
    • vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      12k65k1.9k1.3kUpdated Dec 15, 2025Dec 15, 2025
    • vllm-omni

      Public
      A framework for efficient model inference with omni-modality models
      Python
      1159026034Updated Dec 15, 2025Dec 15, 2025
    • vllm-xpu-kernels

      Public
      The vLLM XPU kernels for Intel GPU
      C++
      151214Updated Dec 15, 2025Dec 15, 2025
    • semantic-router

      Public
      Intelligent Router for Mixture-of-Models
      Go
      3072.4k9434Updated Dec 15, 2025Dec 15, 2025
    • tpu-inference

      Public
      TPU inference for vLLM, with unified JAX and PyTorch support.
      Python
      591941775Updated Dec 15, 2025Dec 15, 2025
    • vllm-gaudi

      Public
      Community maintained hardware plugin for vLLM on Intel Gaudi
      Python
      8019166Updated Dec 15, 2025Dec 15, 2025
    • llm-compressor

      Public
      Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
      Python
      3162.4k7450Updated Dec 15, 2025Dec 15, 2025
    • guidellm

      Public
      Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
      Python
      1077534416Updated Dec 15, 2025Dec 15, 2025
    • compressed-tensors

      Public
      A safetensors extension to efficiently store sparse quantized tensors on disk
      Python
      45215314Updated Dec 14, 2025Dec 14, 2025
    • ci-infra

      Public
      This repo hosts code for vLLM CI & Performance Benchmark infrastructure.
      HCL
      5027029Updated Dec 14, 2025Dec 14, 2025
    • vLLM-in-PyTorch-Conference-2025

      Public
      0700Updated Dec 14, 2025Dec 14, 2025
    • router

      Public
      A high-performance and light-weight router for vLLM large scale deployment
      Rust
      52515Updated Dec 14, 2025Dec 14, 2025
    • aibrix

      Public
      Cost-efficient and pluggable Infrastructure components for GenAI inference
      Go
      4954.5k26528Updated Dec 13, 2025Dec 13, 2025
    • vllm-spyre

      Public
      Community maintained hardware plugin for vLLM on Spyre
      Python
      3037412Updated Dec 12, 2025Dec 12, 2025
    • vllm-metal

      Public
      Community maintained hardware plugin for vLLM on Apple Silicon
      0110Updated Dec 12, 2025Dec 12, 2025
    • recipes

      Public
      Common recipes to run vLLM
      Jupyter Notebook
      1022781026Updated Dec 12, 2025Dec 12, 2025
    • speculators

      Public
      A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
      Python
      21156810Updated Dec 11, 2025Dec 11, 2025
    • flash-attention

      Public
      Fast and memory-efficient exact attention
      Python
      2.2k104016Updated Dec 11, 2025Dec 11, 2025
    • vllm-neuron

      Public
      Community maintained hardware plugin for vLLM on AWS Neuron
      Python
      31501Updated Dec 6, 2025Dec 6, 2025
    • vllm-openvino

      Public
      Python
      102720Updated Dec 4, 2025Dec 4, 2025
    • production-stack

      Public
      vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
      Python
      3372k9354Updated Nov 30, 2025Nov 30, 2025
    • FlashMLA

      Public
      C++
      914903Updated Oct 22, 2025Oct 22, 2025
    • media-kit

      Public
      vLLM Logo Assets
      4600Updated Oct 22, 2025Oct 22, 2025
    • DeepGEMM

      Public
      DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
      Cuda
      774000Updated Sep 29, 2025Sep 29, 2025
    • rfcs

      Public
      0100Updated Jun 3, 2025Jun 3, 2025
    • vllm-project.github.io-static

      Public archive
      HTML
      7901Updated Feb 7, 2025Feb 7, 2025
    • vllm-nccl

      Public archive
      Manages vllm-nccl dependency
      Python
      31720Updated Jun 3, 2024Jun 3, 2024
    • dashboard

      Public
      vLLM performance dashboard
      Python
      84000Updated Apr 26, 2024Apr 26, 2024