Skip to content
Change the repository type filter

All

    Repositories list

    • Mooncake

      Public
      Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
      C++
      471000Updated Dec 8, 2025Dec 8, 2025
    • llama.cpp

      Public
      LLM inference in C/C++
      C++
      14k300Updated Nov 28, 2025Nov 28, 2025
    • Docker Model Runner
      Go
      72000Updated Oct 29, 2025Oct 29, 2025
    • MAD

      Public
      MAD (Model Automation and Dashboarding)
      Shell
      31000Updated Oct 28, 2025Oct 28, 2025
    • gpustack

      Public
      Manage GPU clusters for running LLMs
      Python
      427000Updated Aug 4, 2025Aug 4, 2025
    • ramalama

      Public
      Ramalama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.
      Python
      285000Updated Jul 28, 2025Jul 28, 2025
    • cozeloop

      Public
      Next-generation AI Agent Optimization Platform: Cozeloop addresses challenges in AI agent development by providing full-lifecycle management capabilities from development, debugging, and evaluation to monitoring.
      Go
      706000Updated Jul 26, 2025Jul 26, 2025
    • octotools

      Public
      OctoTools: An agentic framework with extensible tools for complex reasoning
      Python
      180000Updated Jul 24, 2025Jul 24, 2025
    • llama-box

      Public
      LLM inference server implementation based on llama.cpp.
      C++
      29000Updated Jul 24, 2025Jul 24, 2025
    • Stable Diffusion and Flux in pure C/C++
      C++
      471000Updated Jul 24, 2025Jul 24, 2025
    • Port of OpenAI's Whisper model in C/C++
      C++
      5k000Updated Jul 24, 2025Jul 24, 2025
    • jax

      Public
      Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
      Python
      3.3k000Updated Jul 21, 2025Jul 21, 2025
    • ollama

      Public
      Get up and running with Llama 3, Mistral, Gemma, and other large language models.
      Go
      14k000Updated Jun 18, 2025Jun 18, 2025
    • A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
      Python
      1.2k100Updated Mar 20, 2025Mar 20, 2025
    • exo

      Public
      Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
      Python
      2.3k000Updated Nov 28, 2024Nov 28, 2024
    • Python bindings for llama.cpp
      Python
      1.3k000Updated Nov 26, 2024Nov 26, 2024
    • fastfetch

      Public
      Like neofetch, but much faster because written mostly in C.
      C
      641000Updated Nov 19, 2024Nov 19, 2024
    • vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      12k000Updated Oct 16, 2024Oct 16, 2024
    • k8sgpt

      Public
      Giving Kubernetes Superpowers to everyone
      Go
      907000Updated Sep 24, 2024Sep 24, 2024
    • Automatic SRE Superpowers within your Kubernetes cluster
      Go
      129000Updated Jul 31, 2024Jul 31, 2024
    • llm.c

      Public
      LLM training in simple, raw C/CUDA
      Cuda
      3.3k000Updated Jul 22, 2024Jul 22, 2024
    • A proxy that allows you to host ollama images in your local environment
      Go
      4000Updated Jul 2, 2024Jul 2, 2024
    • LLM Benchmark for Throughput via Ollama (Local LLMs)
      Python
      34000Updated Jun 11, 2024Jun 11, 2024
    • makllama

      Public
      MaK(Mac+Kubernetes)llama - Running and orchestrating large language models (LLMs) on Kubernetes with macOS nodes.
      Go
      34300Updated May 22, 2024May 22, 2024
    • An open and reliable container runtime
      Go
      3.7k100Updated May 22, 2024May 22, 2024
    • cri

      Public
      Go
      17100Updated May 21, 2024May 21, 2024