MaKLlama

All

26 repositories

Mooncake
Public
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
C++
•
Apache License 2.0
•471•0•0•0•Updated Dec 8, 2025Dec 8, 2025
llama.cpp
Public
LLM inference in C/C++
C++
•
MIT License
•14k•3•0•0•Updated Nov 28, 2025Nov 28, 2025
model-runner
Public
Docker Model Runner
Go
•
Apache License 2.0
•72•0•0•0•Updated Oct 29, 2025Oct 29, 2025
MAD
Public
MAD (Model Automation and Dashboarding)
Shell
•
MIT License
•31•0•0•0•Updated Oct 28, 2025Oct 28, 2025
gpustack
Public
Manage GPU clusters for running LLMs
Python
•
Apache License 2.0
•427•0•0•0•Updated Aug 4, 2025Aug 4, 2025
ramalama
Public
Ramalama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.
Python
•
MIT License
•285•0•0•0•Updated Jul 28, 2025Jul 28, 2025
cozeloop
Public
Next-generation AI Agent Optimization Platform: Cozeloop addresses challenges in AI agent development by providing full-lifecycle management capabilities from development, debugging, and evaluation to monitoring.
Go
•
Apache License 2.0
•706•0•0•0•Updated Jul 26, 2025Jul 26, 2025
octotools
Public
OctoTools: An agentic framework with extensible tools for complex reasoning
Python
•
MIT License
•180•0•0•0•Updated Jul 24, 2025Jul 24, 2025
llama-box
Public
LLM inference server implementation based on llama.cpp.
C++
•
MIT License
•29•0•0•0•Updated Jul 24, 2025Jul 24, 2025
stable-diffusion.cpp
Public
Stable Diffusion and Flux in pure C/C++
C++
•
MIT License
•471•0•0•0•Updated Jul 24, 2025Jul 24, 2025
whisper.cpp
Public
Port of OpenAI's Whisper model in C/C++
C++
•
MIT License
•5k•0•0•0•Updated Jul 24, 2025Jul 24, 2025
jax
Public
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Python
•
Apache License 2.0
•3.3k•0•0•0•Updated Jul 21, 2025Jul 21, 2025
ollama
Public
Get up and running with Llama 3, Mistral, Gemma, and other large language models.
Go
•
MIT License
•14k•0•0•0•Updated Jun 18, 2025Jun 18, 2025
ktransformers
Public
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Python
•
Apache License 2.0
•1.2k•1•0•0•Updated Mar 20, 2025Mar 20, 2025
exo
Public
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
Python
•
GNU General Public License v3.0
•2.3k•0•0•0•Updated Nov 28, 2024Nov 28, 2024
llama-cpp-python
Public
Python bindings for llama.cpp
Python
•
MIT License
•1.3k•0•0•0•Updated Nov 26, 2024Nov 26, 2024
fastfetch
Public
Like neofetch, but much faster because written mostly in C.
C
•
MIT License
•641•0•0•0•Updated Nov 19, 2024Nov 19, 2024
vllm
Public
A high-throughput and memory-efficient inference and serving engine for LLMs
Python
•
Apache License 2.0
•12k•0•0•0•Updated Oct 16, 2024Oct 16, 2024
k8sgpt
Public
Giving Kubernetes Superpowers to everyone
Go
•
Apache License 2.0
•907•0•0•0•Updated Sep 24, 2024Sep 24, 2024
k8sgpt-operator
Public
Automatic SRE Superpowers within your Kubernetes cluster
Go
•
Apache License 2.0
•129•0•0•0•Updated Jul 31, 2024Jul 31, 2024
llm.c
Public
LLM training in simple, raw C/CUDA
Cuda
•
MIT License
•3.3k•0•0•0•Updated Jul 22, 2024Jul 22, 2024
ollama-registry-pull-through-proxy
Public
A proxy that allows you to host ollama images in your local environment
Go
•
MIT License
•4•0•0•0•Updated Jul 2, 2024Jul 2, 2024
ollama-benchmark
Public
LLM Benchmark for Throughput via Ollama (Local LLMs)
Python
•
MIT License
•34•0•0•0•Updated Jun 11, 2024Jun 11, 2024
makllama
Public
MaK(Mac+Kubernetes)llama - Running and orchestrating large language models (LLMs) on Kubernetes with macOS nodes.
go kubernetes ai inference llama containerd apple-silicon llm llms
Go
•
Apache License 2.0
•3•43•0•0•Updated May 22, 2024May 22, 2024
containerd
Public
An open and reliable container runtime
Go
•
Apache License 2.0
•3.7k•1•0•0•Updated May 22, 2024May 22, 2024
cri
Public
Go
•17•1•0•0•Updated May 21, 2024May 21, 2024