macOS Metal: bundled llama.cpp b8460 predates Gemma 4 support (b8637), so Gemma-4-E4B-it-GGUF fails to load on Mac

## TL;DR

On macOS arm64, Lemonade v10.1.0+ advertises Gemma 4 support and ships the catalog entry for `Gemma-4-E4B-it-GGUF`, but the bundled `llama.cpp` for the `llamacpp/metal` backend is pinned to **b8460**. Mainline `gemma4` architecture support landed in [`ggml-org/llama.cpp#21309`](https://github.com/ggml-org/llama.cpp/pull/21309) and first shipped in **b8637**. The result: pulling the model succeeds, but every attempt to load it fails with `unknown model architecture: 'gemma4'`.

This breaks the documented Gemma 4 path on Mac and silently looks like a model-corruption issue from the user's POV (the pull/load endpoints both report "success" → the loader returns 500 → downstream tools time out).

## Reproducer

```bash
# Lemonade v10.2.0 on macOS arm64, Apple M4 Pro
lemonade pull Gemma-4-E4B-it-GGUF      # succeeds, ~5.97 GB on disk
lemonade load Gemma-4-E4B-it-GGUF      # → 500: model_load_error: llama-server failed to start
```

The actual error from the spawned `llama-server` is:

```
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4'
common_init_from_params: failed to load model
srv    load_model: failed to load model
main: exiting due to model loading error
```

This is reproducible directly:

```bash
"/Library/Application Support/Lemonade/.cache/bin/llamacpp/metal/llama-server" \
  -m "/Library/Application Support/Lemonade/hub/models--unsloth--gemma-4-E4B-it-GGUF/snapshots/<hash>/gemma-4-E4B-it-Q4_K_M.gguf" \
  --ctx-size 32768 --port 49995 --jinja --keep 16 --reasoning-format auto -ngl 99
```

## Evidence: the bundled build is pinned to b8460

`GET /api/v1/system-info` reports the `llamacpp/metal` recipe at version **b8460** with `state: installed`:

```json
\"metal\": {
  \"download_filename\": \"llama-b8460-bin-macos-arm64.tar.gz\",
  \"release_url\": \"https://github.com/ggml-org/llama.cpp/releases/tag/b8460\",
  \"state\": \"installed\",
  \"version\": \"b8460\"
}
```

`lemonade backends install llamacpp:metal --force` reinstalls **the same b8460** rather than picking up a newer build, so this is hardcoded in the v10.2.0 release config (likely `src/cpp/resources/server_models.json` or the recipes config).

The disk-side version file confirms it:

```bash
$ cat \"/Library/Application Support/Lemonade/.cache/bin/llamacpp/metal/version.txt\"
b8460
```

## Verification of fix

Drop-in replacement with **b8934** (latest llama.cpp release as of this report) loads the model cleanly:

```
print_info: arch                  = gemma4
print_info: model type            = E4B
print_info: model params          = 7.52 B
print_info: file size             = 4.62 GiB (5.28 BPW)
load_tensors: offloaded 43/43 layers to GPU
main: server is listening on http://127.0.0.1:49995
```

Manual workaround:

```bash
# Backup current
sudo cp -R \"/Library/Application Support/Lemonade/.cache/bin/llamacpp/metal\" \\
           \"/Library/Application Support/Lemonade/.cache/bin/llamacpp/metal.b8460.bak\"
# Drop in b8934 (or any build >= b8637)
curl -sSL -o /tmp/llama.tar.gz \\
  \"https://github.com/ggml-org/llama.cpp/releases/download/b8934/llama-b8934-bin-macos-arm64.tar.gz\"
tar -xzf /tmp/llama.tar.gz -C /tmp/
sudo cp /tmp/llama-b8934/* \"/Library/Application Support/Lemonade/.cache/bin/llamacpp/metal/\"
echo \"b8934\" | sudo tee \"/Library/Application Support/Lemonade/.cache/bin/llamacpp/metal/version.txt\"
lemonade stop && lemond &
```

## Suggested fix

Bump the macOS-arm64 (and probably Linux/CPU) `llamacpp` pin to a build **≥ b8637** in the next patch release. Worth checking ROCm/Vulkan pins for the same drift since the gemma4 handler change is in `libllama`, not platform-specific. The catalog metadata for `Gemma-4-E4B-it-GGUF` is fine — only the bundled binary needs the bump.

If a coordinated minimum-build constraint exists in the recipes config, surfacing a clearer error than \"llama-server failed to start\" when the bundled build is too old for a model's arch would also be a meaningful UX improvement (right now it presents as a generic process-spawn failure).

## Environment

- **OS:** macOS 26.2 (Apple M4 Pro, 24 GB)
- **Lemonade:** v10.2.0 (latest stable as of this report)
- **Bundled llama-server:** b8460 (Mar 20)
- **Required for gemma4:** ≥ b8637 (Apr 2)
- **Verified working with:** b8934 (latest at time of filing)
- **Affected model:** `Gemma-4-E4B-it-GGUF` (Unsloth conversion, GGUF arch field = `gemma4`)

Filed downstream from [amd/gaia](https://github.com/amd/gaia) where this surfaced as a chat-hang for users running the universal Gemma 4 default introduced in [amd/gaia#865](https://github.com/amd/gaia/pull/865).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

macOS Metal: bundled llama.cpp b8460 predates Gemma 4 support (b8637), so Gemma-4-E4B-it-GGUF fails to load on Mac #1741

TL;DR

Reproducer

Evidence: the bundled build is pinned to b8460

Verification of fix

Suggested fix

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

macOS Metal: bundled llama.cpp b8460 predates Gemma 4 support (b8637), so Gemma-4-E4B-it-GGUF fails to load on Mac #1741

Description

TL;DR

Reproducer

Evidence: the bundled build is pinned to b8460

Verification of fix

Suggested fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions