Skip to content
Change the repository type filter

All

    Repositories list

    • modular

      Public
      The Modular Platform (includes MAX & Mojo)
      Mojo
      Other
      2.8k008Updated Mar 19, 2026Mar 19, 2026
    • vllm-rbln

      Public
      vLLM plugin for RBLN NPU
      Python
      Apache License 2.0
      8000Updated Feb 26, 2026Feb 26, 2026
    • vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      15k000Updated Feb 10, 2026Feb 10, 2026
    • Python
      0000Updated Jan 22, 2026Jan 22, 2026
    • Jupyter Notebook
      2600Updated Jan 18, 2026Jan 18, 2026
    • Yetter Python Client
      Python
      0000Updated Dec 24, 2025Dec 24, 2025
    • Intel-Gaudi-Hands-on-Workshop
      0000Updated Dec 15, 2025Dec 15, 2025
    • GraLoRA

      Public
      Jupyter Notebook
      23200Updated Nov 18, 2025Nov 18, 2025
    • owlite

      Public
      OwLite is a low-code AI model compression toolkit for AI models.
      Python
      GNU Affero General Public License v3.0
      45300Updated Nov 14, 2025Nov 14, 2025
    • OwLite Examples repository offers illustrative example codes to help users seamlessly compress PyTorch deep learning models and transform them into TensorRT eng…
      Python
      1911Updated Nov 14, 2025Nov 14, 2025
    • Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
      Python
      Apache License 2.0
      445000Updated Nov 12, 2025Nov 12, 2025
    • 🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference an…
      Python
      Apache License 2.0
      33k000Updated Nov 4, 2025Nov 4, 2025
    • vllm-fork

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      15k000Updated Nov 4, 2025Nov 4, 2025
    • Machine Learning Engineering Open Book
      Python
      Creative Commons Attribution Share Alike 4.0 International
      1.1k100Updated Sep 1, 2025Sep 1, 2025
    • SGLang is a fast serving framework for large language models and vision language models.
      Python
      Apache License 2.0
      4.9k000Updated Aug 28, 2025Aug 28, 2025
    • Ditto is an open-source framework that enables direct conversion of HuggingFace PreTrainedModels into TensorRT-LLM engines.
      Python
      Apache License 2.0
      35500Updated Jul 16, 2025Jul 16, 2025
    • A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      15k000Updated Jul 14, 2025Jul 14, 2025
    • LMCache

      Public
      Redis for LLMs
      Python
      Apache License 2.0
      1k000Updated Jul 10, 2025Jul 10, 2025
    • 0000Updated Jul 9, 2025Jul 9, 2025
    • TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optim…
      C++
      Apache License 2.0
      2.2k001Updated Jun 26, 2025Jun 26, 2025
    • fal-js

      Public
      The JavaScript client and utilities to fal-serverless with built-in TypeScript definitions
      TypeScript
      MIT License
      40000Updated May 30, 2025May 30, 2025
    • gradio

      Public
      Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
      Python
      Apache License 2.0
      3.3k000Updated Jan 13, 2025Jan 13, 2025
    • Python
      Apache License 2.0
      49000Updated Nov 22, 2024Nov 22, 2024
    • Intel Neural Compressor
      Python
      Apache License 2.0
      0000Updated Oct 22, 2024Oct 22, 2024
    • Isolated DinD (Docker in Docker) container for developing and deploying Docker containers using NVIDIA GPUs and the NVIDIA container toolkit. Useful for deployi…
      Dockerfile
      Mozilla Public License 2.0
      19000Updated Aug 27, 2024Aug 27, 2024
    • C++
      Apache License 2.0
      1001Updated Jul 23, 2024Jul 23, 2024
    • .github

      Public
      0000Updated Jul 22, 2024Jul 22, 2024
    • QUICK

      Public
      QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
      Python
      MIT License
      512060Updated Mar 6, 2024Mar 6, 2024