Skip to content
Change the repository type filter

All

    Repositories list

    • Mooncake

      Public
      Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
      C++
      Apache License 2.0
      8655.6k302206Updated Jun 22, 2026Jun 22, 2026
    • A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
      Python
      Apache License 2.0
      1.3k17k4458Updated Jun 21, 2026Jun 21, 2026
    • sglang

      Public
      SGLang is a fast serving framework for large language models and vision language models.
      Python
      Apache License 2.0
      6.6k11013Updated Jun 21, 2026Jun 21, 2026
    • JavaScript
      MIT License
      91201Updated Jun 20, 2026Jun 20, 2026
    • A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models …
      Python
      Apache License 2.0
      453000Updated Jun 17, 2026Jun 17, 2026
    • vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      18k1501Updated Jun 16, 2026Jun 16, 2026
    • 🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-…
      Python
      Apache License 2.0
      1.4k101Updated May 9, 2026May 9, 2026
    • 🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference an…
      Python
      Apache License 2.0
      34k001Updated May 9, 2026May 9, 2026
    • SGLang is a fast serving framework for large language models and vision language models.
      Python
      Apache License 2.0
      6.6k200Updated Apr 22, 2026Apr 22, 2026
    • evalscope

      Public
      A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.
      Python
      Apache License 2.0
      404000Updated Apr 13, 2026Apr 13, 2026
    • DeepEP: an efficient expert-parallel communication library that supports fault tolerance
      Cuda
      MIT License
      1.3k300Updated Jan 5, 2026Jan 5, 2026
    • gpustack

      Public
      GPU cluster manager for optimized AI model deployment
      Python
      Apache License 2.0
      549000Updated Dec 7, 2025Dec 7, 2025
    • TrEnv-X

      Public
      Go
      Apache License 2.0
      88700Updated Sep 15, 2025Sep 15, 2025
    • SGLang is a fast serving framework for large language models and vision language models.
      Python
      Apache License 2.0
      6.6k000Updated Aug 12, 2025Aug 12, 2025
    • FlashInfer: Kernel Library for LLM Serving
      Cuda
      Apache License 2.0
      1.1k700Updated Jul 24, 2025Jul 24, 2025
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.