Skip to content
Change the repository type filter

All

    Repositories list

    • mif

      Public
      MIF: MoAI Inference Framework
      Go
      0001Updated Jan 22, 2026Jan 22, 2026
    • SpecForge

      Public
      Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
      Python
      143000Updated Jan 22, 2026Jan 22, 2026
    • vllm-pcp

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      13k002Updated Jan 22, 2026Jan 22, 2026
    • vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      13k005Updated Jan 22, 2026Jan 22, 2026
    • mori

      Public
      Modular RDMA Interface
      C++
      17002Updated Jan 22, 2026Jan 22, 2026
    • Distributed KV cache coordinator
      Go
      78000Updated Jan 21, 2026Jan 21, 2026
    • EPLB

      Public
      Expert Parallelism Load Balancer
      Python
      198000Updated Jan 21, 2026Jan 21, 2026
    • Helm Chart Repository
      Makefile
      0000Updated Jan 20, 2026Jan 20, 2026
    • Inference scheduler for llm-d
      Go
      114000Updated Jan 19, 2026Jan 19, 2026
    • Gateway API Inference Extension
      Go
      222000Updated Jan 16, 2026Jan 16, 2026
    • sglang

      Public
      SGLang is a fast serving framework for large language models and vision language models.
      Python
      4.1k002Updated Jan 14, 2026Jan 14, 2026
    • skypilot

      Public
      SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 16+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
      Python
      922000Updated Jan 1, 2026Jan 1, 2026
    • GenAI inference performance benchmarking tool
      Python
      61000Updated Dec 18, 2025Dec 18, 2025
    • repo for MI355X benchmark (TensorWave)
      Shell
      71000Updated Nov 26, 2025Nov 26, 2025
    • 🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
      Python
      32k000Updated Nov 26, 2025Nov 26, 2025
    • Fast and memory-efficient exact attention
      Python
      2.3k000Updated Nov 17, 2025Nov 17, 2025
    • lws

      Public
      LeaderWorkerSet: An API for deploying a group of pods as a unit of replication
      Go
      127000Updated Nov 3, 2025Nov 3, 2025
    • 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
      Python
      32k200Updated Oct 31, 2025Oct 31, 2025
    • kgateway

      Public
      The Cloud-Native API Gateway and AI Gateway
      Go
      647000Updated Oct 20, 2025Oct 20, 2025
    • LMCache

      Public
      Redis for LLMs
      Python
      867000Updated Oct 13, 2025Oct 13, 2025
    • gpt-oss

      Public
      Inference gpt-oss in one file of pure C
      Python
      2.4k100Updated Oct 12, 2025Oct 12, 2025
    • k0s

      Public
      k0s - The Zero Friction Kubernetes
      Go
      467000Updated Oct 2, 2025Oct 2, 2025
    • Kubernetes Operator for OpenTelemetry Collector
      Go
      585000Updated Sep 27, 2025Sep 27, 2025
    • Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems.
      Python
      46000Updated Sep 25, 2025Sep 25, 2025
    • 0000Updated Sep 5, 2025Sep 5, 2025
    • DeepEP

      Public
      DeepEP: an efficient expert-parallel communication library
      Cuda
      1.1k000Updated Aug 1, 2025Aug 1, 2025
    • Fast and memory-efficient exact attention
      Python
      2.3k000Updated Jul 8, 2025Jul 8, 2025
    • git pre-commit hook for automation
      Shell
      0000Updated Jun 17, 2025Jun 17, 2025
    • Python
      1301Updated Jun 11, 2025Jun 11, 2025
    • tt-umd

      Public
      User-Mode Driver for Tenstorrent hardware
      C++
      23000Updated Jun 10, 2025Jun 10, 2025