Skip to content
@makllama

MaKLlama

MaK(Mac+Kubernetes)llama: running and orchestrating large language models (LLMs) on Kubernetes with Mac nodes.

MaKllama Organization

The following video demonstrates the below steps:

  1. Add a Mac node with Apple-Silicon chip to a Kubernetes cluster (in seconds!).
  2. Manually start Bronze Willow (BW) on the Mac node (top-right terminal).
  3. Deploy tinyllama with 2 replicas.
  4. Access the OpenAI API-compatible endpoint through mods.

Demo

Popular repositories Loading

  1. makllama makllama Public

    MaK(Mac+Kubernetes)llama - Running and orchestrating large language models (LLMs) on Kubernetes with macOS nodes.

    Go 40 3

  2. llama.cpp llama.cpp Public

    Forked from ggml-org/llama.cpp

    LLM inference in C/C++

    C++ 3

  3. containerd containerd Public

    Forked from containerd/containerd

    An open and reliable container runtime

    Go 1

  4. cri cri Public

    Forked from virtual-kubelet/cri

    Go 1 1

  5. ktransformers ktransformers Public

    Forked from kvcache-ai/ktransformers

    A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

    Python 1

  6. .github .github Public

Repositories

Showing 10 of 21 repositories
  • ramalama Public Forked from containers/ramalama

    Ramalama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

    makllama/ramalama’s past year of commit activity
    Python 0 MIT 188 0 0 Updated May 15, 2025
  • llama.cpp Public Forked from ggml-org/llama.cpp

    LLM inference in C/C++

    makllama/llama.cpp’s past year of commit activity
    C++ 3 MIT 12,078 0 0 Updated May 12, 2025
  • ollama Public Forked from ollama/ollama

    Get up and running with Llama 3, Mistral, Gemma, and other large language models.

    makllama/ollama’s past year of commit activity
    Go 0 MIT 12,111 0 0 Updated May 6, 2025
  • whisper.cpp Public Forked from ggml-org/whisper.cpp

    Port of OpenAI's Whisper model in C/C++

    makllama/whisper.cpp’s past year of commit activity
    C++ 0 MIT 4,357 0 0 Updated Apr 24, 2025
  • gpustack Public Forked from gpustack/gpustack

    Manage GPU clusters for running LLMs

    makllama/gpustack’s past year of commit activity
    Python 0 Apache-2.0 272 0 0 Updated Apr 23, 2025
  • stable-diffusion.cpp Public Forked from leejet/stable-diffusion.cpp

    Stable Diffusion and Flux in pure C/C++

    makllama/stable-diffusion.cpp’s past year of commit activity
    C++ 0 MIT 387 0 0 Updated Apr 22, 2025
  • ktransformers Public Forked from kvcache-ai/ktransformers

    A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

    makllama/ktransformers’s past year of commit activity
    Python 1 Apache-2.0 989 0 0 Updated Mar 20, 2025
  • llama-box Public Forked from gpustack/llama-box

    LLM inference server implementation based on llama.cpp.

    makllama/llama-box’s past year of commit activity
    C++ 0 MIT 16 0 0 Updated Feb 16, 2025
  • exo Public Forked from exo-explore/exo

    Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚

    makllama/exo’s past year of commit activity
    Python 0 GPL-3.0 1,773 0 0 Updated Nov 28, 2024
  • llama-cpp-python Public Forked from abetlen/llama-cpp-python

    Python bindings for llama.cpp

    makllama/llama-cpp-python’s past year of commit activity
    Python 0 MIT 1,193 0 0 Updated Nov 26, 2024

Top languages

Loading…

Most used topics

Loading…