TL;DR
On macOS arm64, Lemonade v10.1.0+ advertises Gemma 4 support and ships the catalog entry for Gemma-4-E4B-it-GGUF, but the bundled llama.cpp for the llamacpp/metal backend is pinned to b8460. Mainline gemma4 architecture support landed in ggml-org/llama.cpp#21309 and first shipped in b8637. The result: pulling the model succeeds, but every attempt to load it fails with unknown model architecture: 'gemma4'.
This breaks the documented Gemma 4 path on Mac and silently looks like a model-corruption issue from the user's POV (the pull/load endpoints both report "success" → the loader returns 500 → downstream tools time out).
Reproducer
# Lemonade v10.2.0 on macOS arm64, Apple M4 Pro
lemonade pull Gemma-4-E4B-it-GGUF # succeeds, ~5.97 GB on disk
lemonade load Gemma-4-E4B-it-GGUF # → 500: model_load_error: llama-server failed to start
The actual error from the spawned llama-server is:
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4'
common_init_from_params: failed to load model
srv load_model: failed to load model
main: exiting due to model loading error
This is reproducible directly:
"/Library/Application Support/Lemonade/.cache/bin/llamacpp/metal/llama-server" \
-m "/Library/Application Support/Lemonade/hub/models--unsloth--gemma-4-E4B-it-GGUF/snapshots/<hash>/gemma-4-E4B-it-Q4_K_M.gguf" \
--ctx-size 32768 --port 49995 --jinja --keep 16 --reasoning-format auto -ngl 99
Evidence: the bundled build is pinned to b8460
GET /api/v1/system-info reports the llamacpp/metal recipe at version b8460 with state: installed:
\"metal\": {
\"download_filename\": \"llama-b8460-bin-macos-arm64.tar.gz\",
\"release_url\": \"https://github.com/ggml-org/llama.cpp/releases/tag/b8460\",
\"state\": \"installed\",
\"version\": \"b8460\"
}
lemonade backends install llamacpp:metal --force reinstalls the same b8460 rather than picking up a newer build, so this is hardcoded in the v10.2.0 release config (likely src/cpp/resources/server_models.json or the recipes config).
The disk-side version file confirms it:
$ cat \"/Library/Application Support/Lemonade/.cache/bin/llamacpp/metal/version.txt\"
b8460
Verification of fix
Drop-in replacement with b8934 (latest llama.cpp release as of this report) loads the model cleanly:
print_info: arch = gemma4
print_info: model type = E4B
print_info: model params = 7.52 B
print_info: file size = 4.62 GiB (5.28 BPW)
load_tensors: offloaded 43/43 layers to GPU
main: server is listening on http://127.0.0.1:49995
Manual workaround:
# Backup current
sudo cp -R \"/Library/Application Support/Lemonade/.cache/bin/llamacpp/metal\" \\
\"/Library/Application Support/Lemonade/.cache/bin/llamacpp/metal.b8460.bak\"
# Drop in b8934 (or any build >= b8637)
curl -sSL -o /tmp/llama.tar.gz \\
\"https://github.com/ggml-org/llama.cpp/releases/download/b8934/llama-b8934-bin-macos-arm64.tar.gz\"
tar -xzf /tmp/llama.tar.gz -C /tmp/
sudo cp /tmp/llama-b8934/* \"/Library/Application Support/Lemonade/.cache/bin/llamacpp/metal/\"
echo \"b8934\" | sudo tee \"/Library/Application Support/Lemonade/.cache/bin/llamacpp/metal/version.txt\"
lemonade stop && lemond &
Suggested fix
Bump the macOS-arm64 (and probably Linux/CPU) llamacpp pin to a build ≥ b8637 in the next patch release. Worth checking ROCm/Vulkan pins for the same drift since the gemma4 handler change is in libllama, not platform-specific. The catalog metadata for Gemma-4-E4B-it-GGUF is fine — only the bundled binary needs the bump.
If a coordinated minimum-build constraint exists in the recipes config, surfacing a clearer error than "llama-server failed to start" when the bundled build is too old for a model's arch would also be a meaningful UX improvement (right now it presents as a generic process-spawn failure).
Environment
- OS: macOS 26.2 (Apple M4 Pro, 24 GB)
- Lemonade: v10.2.0 (latest stable as of this report)
- Bundled llama-server: b8460 (Mar 20)
- Required for gemma4: ≥ b8637 (Apr 2)
- Verified working with: b8934 (latest at time of filing)
- Affected model:
Gemma-4-E4B-it-GGUF (Unsloth conversion, GGUF arch field = gemma4)
Filed downstream from amd/gaia where this surfaced as a chat-hang for users running the universal Gemma 4 default introduced in amd/gaia#865.
TL;DR
On macOS arm64, Lemonade v10.1.0+ advertises Gemma 4 support and ships the catalog entry for
Gemma-4-E4B-it-GGUF, but the bundledllama.cppfor thellamacpp/metalbackend is pinned to b8460. Mainlinegemma4architecture support landed inggml-org/llama.cpp#21309and first shipped in b8637. The result: pulling the model succeeds, but every attempt to load it fails withunknown model architecture: 'gemma4'.This breaks the documented Gemma 4 path on Mac and silently looks like a model-corruption issue from the user's POV (the pull/load endpoints both report "success" → the loader returns 500 → downstream tools time out).
Reproducer
The actual error from the spawned
llama-serveris:This is reproducible directly:
Evidence: the bundled build is pinned to b8460
GET /api/v1/system-inforeports thellamacpp/metalrecipe at version b8460 withstate: installed:lemonade backends install llamacpp:metal --forcereinstalls the same b8460 rather than picking up a newer build, so this is hardcoded in the v10.2.0 release config (likelysrc/cpp/resources/server_models.jsonor the recipes config).The disk-side version file confirms it:
Verification of fix
Drop-in replacement with b8934 (latest llama.cpp release as of this report) loads the model cleanly:
Manual workaround:
Suggested fix
Bump the macOS-arm64 (and probably Linux/CPU)
llamacpppin to a build ≥ b8637 in the next patch release. Worth checking ROCm/Vulkan pins for the same drift since the gemma4 handler change is inlibllama, not platform-specific. The catalog metadata forGemma-4-E4B-it-GGUFis fine — only the bundled binary needs the bump.If a coordinated minimum-build constraint exists in the recipes config, surfacing a clearer error than "llama-server failed to start" when the bundled build is too old for a model's arch would also be a meaningful UX improvement (right now it presents as a generic process-spawn failure).
Environment
Gemma-4-E4B-it-GGUF(Unsloth conversion, GGUF arch field =gemma4)Filed downstream from amd/gaia where this surfaced as a chat-hang for users running the universal Gemma 4 default introduced in amd/gaia#865.