Skip to content

macOS Metal: bundled llama.cpp b8460 predates Gemma 4 support (b8637), so Gemma-4-E4B-it-GGUF fails to load on Mac #1741

@kovtcharov

Description

@kovtcharov

TL;DR

On macOS arm64, Lemonade v10.1.0+ advertises Gemma 4 support and ships the catalog entry for Gemma-4-E4B-it-GGUF, but the bundled llama.cpp for the llamacpp/metal backend is pinned to b8460. Mainline gemma4 architecture support landed in ggml-org/llama.cpp#21309 and first shipped in b8637. The result: pulling the model succeeds, but every attempt to load it fails with unknown model architecture: 'gemma4'.

This breaks the documented Gemma 4 path on Mac and silently looks like a model-corruption issue from the user's POV (the pull/load endpoints both report "success" → the loader returns 500 → downstream tools time out).

Reproducer

# Lemonade v10.2.0 on macOS arm64, Apple M4 Pro
lemonade pull Gemma-4-E4B-it-GGUF      # succeeds, ~5.97 GB on disk
lemonade load Gemma-4-E4B-it-GGUF      # → 500: model_load_error: llama-server failed to start

The actual error from the spawned llama-server is:

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4'
common_init_from_params: failed to load model
srv    load_model: failed to load model
main: exiting due to model loading error

This is reproducible directly:

"/Library/Application Support/Lemonade/.cache/bin/llamacpp/metal/llama-server" \
  -m "/Library/Application Support/Lemonade/hub/models--unsloth--gemma-4-E4B-it-GGUF/snapshots/<hash>/gemma-4-E4B-it-Q4_K_M.gguf" \
  --ctx-size 32768 --port 49995 --jinja --keep 16 --reasoning-format auto -ngl 99

Evidence: the bundled build is pinned to b8460

GET /api/v1/system-info reports the llamacpp/metal recipe at version b8460 with state: installed:

\"metal\": {
  \"download_filename\": \"llama-b8460-bin-macos-arm64.tar.gz\",
  \"release_url\": \"https://github.com/ggml-org/llama.cpp/releases/tag/b8460\",
  \"state\": \"installed\",
  \"version\": \"b8460\"
}

lemonade backends install llamacpp:metal --force reinstalls the same b8460 rather than picking up a newer build, so this is hardcoded in the v10.2.0 release config (likely src/cpp/resources/server_models.json or the recipes config).

The disk-side version file confirms it:

$ cat \"/Library/Application Support/Lemonade/.cache/bin/llamacpp/metal/version.txt\"
b8460

Verification of fix

Drop-in replacement with b8934 (latest llama.cpp release as of this report) loads the model cleanly:

print_info: arch                  = gemma4
print_info: model type            = E4B
print_info: model params          = 7.52 B
print_info: file size             = 4.62 GiB (5.28 BPW)
load_tensors: offloaded 43/43 layers to GPU
main: server is listening on http://127.0.0.1:49995

Manual workaround:

# Backup current
sudo cp -R \"/Library/Application Support/Lemonade/.cache/bin/llamacpp/metal\" \\
           \"/Library/Application Support/Lemonade/.cache/bin/llamacpp/metal.b8460.bak\"
# Drop in b8934 (or any build >= b8637)
curl -sSL -o /tmp/llama.tar.gz \\
  \"https://github.com/ggml-org/llama.cpp/releases/download/b8934/llama-b8934-bin-macos-arm64.tar.gz\"
tar -xzf /tmp/llama.tar.gz -C /tmp/
sudo cp /tmp/llama-b8934/* \"/Library/Application Support/Lemonade/.cache/bin/llamacpp/metal/\"
echo \"b8934\" | sudo tee \"/Library/Application Support/Lemonade/.cache/bin/llamacpp/metal/version.txt\"
lemonade stop && lemond &

Suggested fix

Bump the macOS-arm64 (and probably Linux/CPU) llamacpp pin to a build ≥ b8637 in the next patch release. Worth checking ROCm/Vulkan pins for the same drift since the gemma4 handler change is in libllama, not platform-specific. The catalog metadata for Gemma-4-E4B-it-GGUF is fine — only the bundled binary needs the bump.

If a coordinated minimum-build constraint exists in the recipes config, surfacing a clearer error than "llama-server failed to start" when the bundled build is too old for a model's arch would also be a meaningful UX improvement (right now it presents as a generic process-spawn failure).

Environment

  • OS: macOS 26.2 (Apple M4 Pro, 24 GB)
  • Lemonade: v10.2.0 (latest stable as of this report)
  • Bundled llama-server: b8460 (Mar 20)
  • Required for gemma4: ≥ b8637 (Apr 2)
  • Verified working with: b8934 (latest at time of filing)
  • Affected model: Gemma-4-E4B-it-GGUF (Unsloth conversion, GGUF arch field = gemma4)

Filed downstream from amd/gaia where this surfaced as a chat-hang for users running the universal Gemma 4 default introduced in amd/gaia#865.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions