You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(llm): add Gemma 4 E4B as default and native tool_calls priority (#865)
## Summary
Gemma-4-E4B-it-GGUF becomes GAIA's default model for all roles (LLM,
VLM, installer profiles, CLI, Agent UI, eval, EMR). Simultaneously
inverts the tool-call priority chain so native OpenAI `tool_calls` is
the primary path, with embedded-JSON format falling back only for legacy
non-tool-calling models. Also bumps the minimum Lemonade version to
v10.1.0 (which moved its default port from 8000 → 13305 and is where
Gemma 4 support was added).
This ships on top of the existing UI model-resolution fixes (#841,
#842). Resolves#863.
## What changed and why
- **Universal Gemma default** — Gemma 4 E4B is natively multimodal
(~4.5B effective params, 128K context, Apache 2.0), making it the right
single default across the LLM/VLM split that previously required two
different models. Footprint drops 19.7 GB → 5 GB.
- **Native tool_calls path** (Lemonade v10.1.0+ `--jinja`) — GAIA now
passes `tools=[...]` to Lemonade for tool-capable models. The response
comes back as native `tool_calls`; `LemonadeProvider.chat()` encodes
them as a sentinel JSON string (`{"__tool_calls__": ...}`) so no callers
need a type change. `_parse_llm_response` detects the sentinel and
returns the unified `{"tool": ..., "tool_args": ...}` dict.
- **System-prompt gating** — The embedded-JSON format block
(`_PLANNING_FORMAT`/`_CONVERSATIONAL_FORMAT`) is excluded from the
composed system prompt for tool-calling models; it actively prevented
native `tool_calls` in prior testing.
- **Startup validator** — `_validate_profile_model_registry()` raises at
import time if any `AGENT_PROFILES` entry references a model key not in
`MODELS`.
- **Lemonade v10.1.0+ / port 13305** — `DEFAULT_PORT` flipped from 8000
to 13305 (Lemonade's [spring-cleaning
release](https://github.com/lemonade-sdk/lemonade/wiki/Migration#v10x---v101)
changed the default). 75 files updated (agents, UI, MCP bridge, RAG SDK,
VLM, CLI, tests, docs). `min_lemonade_version = 10.1.0` everywhere
`INIT_PROFILES` is declared.
- **Eval baselines** — Pre-swap Qwen3.5-35B baseline at commit
`3b51ca92` and post-swap Gemma-4-E4B baseline both committed under
`tests/fixtures/eval_baselines/`; Gemma outperforms Qwen 14/15 vs 13/15
(see comment below for per-scenario breakdown).
## Test plan
- [x] `python -m pytest tests/unit/ --ignore=tests/unit/chat/ui/ -q` →
928 passed, 16 skipped
- [x] `python -m pytest tests/unit/test_tool_call_priority.py -v` → 23
passed (sentinel detection, native branch parsing, edge cases, prompt
gating, startup validator)
- [x] `python util/lint.py --black --isort --flake8` → all pass
- [x] Eval against Gemma-4-E4B on Lemonade v10.2.0, Sonnet judge → 14/15
scenarios pass, beats Qwen baseline (see comment)
- [x] Verified `claude -p --model claude-sonnet-4-6` was actually the
judge (not Opus) via `modelUsage` in test subprocess
## Open follow-ups (not blockers for this PR)
- `tool_selection/known_path_read` regression: Gemma doesn't discover
indexed-internal-copy fallback path in Turn 1 after Access-Denied on the
original. Prompt-engineering candidate.
- `/api/system/status` reports the catalog `ctx_size` even when Lemonade
loaded the model with a smaller window. Surface a warning when they
diverge; a whole eval run was wasted due to this mask.
---------
Co-authored-by: Tomasz Iniewicz <tomasz.iniewicz@amd.com>
Co-authored-by: Kalin Ovtcharov <kalin@extropolis.ai>
Copy file name to clipboardExpand all lines: cpp/README.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,23 +28,23 @@ Included demos:
28
28
29
29
The agent connects to an OpenAI-compatible LLM server at `http://localhost:8000/api/v1` by default. The reference backend is [Lemonade Server](https://github.com/lemonade-sdk/lemonade), which runs models locally on AMD hardware.
30
30
31
-
Download and install Lemonade Server v10.0.0, then start it:
31
+
Download and install Lemonade Server v10.2.0, then start it:
0 commit comments