Skip to content

Commit 5d37771

Browse files
itomekitomek-amdkovtcharov
authored
feat(llm): add Gemma 4 E4B as default and native tool_calls priority (#865)
## Summary Gemma-4-E4B-it-GGUF becomes GAIA's default model for all roles (LLM, VLM, installer profiles, CLI, Agent UI, eval, EMR). Simultaneously inverts the tool-call priority chain so native OpenAI `tool_calls` is the primary path, with embedded-JSON format falling back only for legacy non-tool-calling models. Also bumps the minimum Lemonade version to v10.1.0 (which moved its default port from 8000 → 13305 and is where Gemma 4 support was added). This ships on top of the existing UI model-resolution fixes (#841, #842). Resolves #863. ## What changed and why - **Universal Gemma default** — Gemma 4 E4B is natively multimodal (~4.5B effective params, 128K context, Apache 2.0), making it the right single default across the LLM/VLM split that previously required two different models. Footprint drops 19.7 GB → 5 GB. - **Native tool_calls path** (Lemonade v10.1.0+ `--jinja`) — GAIA now passes `tools=[...]` to Lemonade for tool-capable models. The response comes back as native `tool_calls`; `LemonadeProvider.chat()` encodes them as a sentinel JSON string (`{"__tool_calls__": ...}`) so no callers need a type change. `_parse_llm_response` detects the sentinel and returns the unified `{"tool": ..., "tool_args": ...}` dict. - **System-prompt gating** — The embedded-JSON format block (`_PLANNING_FORMAT`/`_CONVERSATIONAL_FORMAT`) is excluded from the composed system prompt for tool-calling models; it actively prevented native `tool_calls` in prior testing. - **Startup validator** — `_validate_profile_model_registry()` raises at import time if any `AGENT_PROFILES` entry references a model key not in `MODELS`. - **Lemonade v10.1.0+ / port 13305** — `DEFAULT_PORT` flipped from 8000 to 13305 (Lemonade's [spring-cleaning release](https://github.com/lemonade-sdk/lemonade/wiki/Migration#v10x---v101) changed the default). 75 files updated (agents, UI, MCP bridge, RAG SDK, VLM, CLI, tests, docs). `min_lemonade_version = 10.1.0` everywhere `INIT_PROFILES` is declared. - **Eval baselines** — Pre-swap Qwen3.5-35B baseline at commit `3b51ca92` and post-swap Gemma-4-E4B baseline both committed under `tests/fixtures/eval_baselines/`; Gemma outperforms Qwen 14/15 vs 13/15 (see comment below for per-scenario breakdown). ## Test plan - [x] `python -m pytest tests/unit/ --ignore=tests/unit/chat/ui/ -q` → 928 passed, 16 skipped - [x] `python -m pytest tests/unit/test_tool_call_priority.py -v` → 23 passed (sentinel detection, native branch parsing, edge cases, prompt gating, startup validator) - [x] `python util/lint.py --black --isort --flake8` → all pass - [x] Eval against Gemma-4-E4B on Lemonade v10.2.0, Sonnet judge → 14/15 scenarios pass, beats Qwen baseline (see comment) - [x] Verified `claude -p --model claude-sonnet-4-6` was actually the judge (not Opus) via `modelUsage` in test subprocess ## Open follow-ups (not blockers for this PR) - `tool_selection/known_path_read` regression: Gemma doesn't discover indexed-internal-copy fallback path in Turn 1 after Access-Denied on the original. Prompt-engineering candidate. - `/api/system/status` reports the catalog `ctx_size` even when Lemonade loaded the model with a smaller window. Surface a warning when they diverge; a whole eval run was wasted due to this mask. --------- Co-authored-by: Tomasz Iniewicz <tomasz.iniewicz@amd.com> Co-authored-by: Kalin Ovtcharov <kalin@extropolis.ai>
1 parent ac437e5 commit 5d37771

106 files changed

Lines changed: 4441 additions & 307 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/build_cpp.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -293,21 +293,21 @@ jobs:
293293
timeout-minutes: 30
294294
env:
295295
GAIA_CPP_TEST_MODEL: Qwen3-4B-Instruct-2507-GGUF
296-
GAIA_CPP_BASE_URL: http://localhost:8000/api/v1
296+
GAIA_CPP_BASE_URL: http://localhost:13305/api/v1
297297
run: |
298298
try {
299299
# Start Lemonade with Qwen3-4B-GGUF
300-
.\installer\scripts\start-lemonade.ps1 -ModelName "Qwen3-4B-Instruct-2507-GGUF" -Port 8000 -CtxSize 16384 -InitWaitTime 15
300+
.\installer\scripts\start-lemonade.ps1 -ModelName "Qwen3-4B-Instruct-2507-GGUF" -Port 13305 -CtxSize 16384 -InitWaitTime 15
301301
302302
# Verify health
303-
$health = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/health" -Method GET -TimeoutSec 10
303+
$health = Invoke-RestMethod -Uri "http://localhost:13305/api/v1/health" -Method GET -TimeoutSec 10
304304
if ($health.status -ne "ok") { throw "Lemonade health check failed" }
305305
Write-Host "[OK] Lemonade Server ready with Qwen3-4B-Instruct-2507-GGUF"
306306
307307
# Run all C++ integration tests (LLM + MCP + WiFi + Health)
308308
Write-Host "=== Running C++ Integration Tests (LLM + MCP + WiFi + Health) ==="
309309
$env:GAIA_CPP_TEST_MODEL = "Qwen3-4B-Instruct-2507-GGUF"
310-
$env:GAIA_CPP_BASE_URL = "http://localhost:8000/api/v1"
310+
$env:GAIA_CPP_BASE_URL = "http://localhost:13305/api/v1"
311311
# -j 1: run tests sequentially so they don't compete for the single LLM server
312312
ctest --test-dir cpp/build-integration -C Release --output-on-failure -j 1
313313
if ($LASTEXITCODE -ne 0) { throw "C++ integration tests failed" }

.github/workflows/test_agent_sdk.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ jobs:
8383
# Start the server in the background as a process (not PowerShell job)
8484
Write-Host "Starting lemonade-server in background..."
8585
# Start the server as a background process
86-
$serverProcess = Start-Process -FilePath "lemonade-server" -ArgumentList "serve", "--no-tray" -PassThru -WindowStyle Hidden
86+
$serverProcess = Start-Process -FilePath "lemonade-server" -ArgumentList "serve", "--no-tray", "--port", "13305" -PassThru -WindowStyle Hidden
8787
Write-Host "Started lemonade-server process with ID: $($serverProcess.Id)"
8888
8989
# Wait for server to start up
@@ -97,7 +97,7 @@ jobs:
9797
$waitTime += 2
9898
9999
try {
100-
$response = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/health" -Method GET -TimeoutSec 5
100+
$response = Invoke-RestMethod -Uri "http://localhost:13305/api/v1/health" -Method GET -TimeoutSec 5
101101
Write-Host "Server is ready and responding to health checks"
102102
$serverReady = $true
103103
} catch {
@@ -145,6 +145,8 @@ jobs:
145145
146146
REM Run the comprehensive integration test suite
147147
set PYTHONIOENCODING=utf-8
148+
REM Use the model that was pulled above (overrides DEFAULT_MODEL_NAME=Gemma-4-E4B)
149+
set GAIA_TEST_MODEL=Llama-3.2-3B-Instruct-Hybrid
148150
python tests\test_agent_sdk.py
149151
set integration_exit=%ERRORLEVEL%
150152

.github/workflows/test_api.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ jobs:
7777
$serverJob = Start-Job -ScriptBlock {
7878
# Workaround for Issue #612: Disable Vulkan cooperative matrix optimization
7979
$env:GGML_VK_DISABLE_COOPMAT = "1"
80-
& lemonade-server serve --ctx-size 8192 --host localhost --port 8000 --no-tray 2>&1
80+
& lemonade-server serve --ctx-size 8192 --host localhost --port 13305 --no-tray 2>&1
8181
}
8282
Write-Host "Started Lemonade server job with ID: $($serverJob.Id)"
8383
$env:LEMONADE_JOB_ID = $serverJob.Id
@@ -93,7 +93,7 @@ jobs:
9393
$waitTime += 2
9494
9595
try {
96-
$response = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/health" -Method GET -TimeoutSec 5
96+
$response = Invoke-RestMethod -Uri "http://localhost:13305/api/v1/health" -Method GET -TimeoutSec 5
9797
Write-Host "[OK] Lemonade server is ready"
9898
Write-Host "Health response: $($response | ConvertTo-Json -Compress)"
9999
$serverReady = $true
@@ -112,7 +112,7 @@ jobs:
112112
Write-Host "Pulling Qwen3-0.6B-GGUF..."
113113
try {
114114
$body = @{ model_name = "Qwen3-0.6B-GGUF" } | ConvertTo-Json
115-
$response = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/pull" `
115+
$response = Invoke-RestMethod -Uri "http://localhost:13305/api/v1/pull" `
116116
-Method POST -ContentType "application/json" -Body $body -TimeoutSec 600
117117
Write-Host " [OK] Qwen3-0.6B-GGUF pull initiated"
118118
} catch {
@@ -128,7 +128,7 @@ jobs:
128128
try {
129129
$loadRequest = @{ model_name = "Qwen3-0.6B-GGUF" } | ConvertTo-Json
130130
Write-Host "Loading model: Qwen3-0.6B-GGUF"
131-
$loadResponse = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/load" `
131+
$loadResponse = Invoke-RestMethod -Uri "http://localhost:13305/api/v1/load" `
132132
-Method POST -Body $loadRequest -ContentType "application/json" -TimeoutSec 120
133133
Write-Host "[OK] Model loaded successfully: $($loadResponse | ConvertTo-Json -Compress)"
134134
} catch {
@@ -144,7 +144,7 @@ jobs:
144144
145145
# Verify models
146146
try {
147-
$models = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/models" -Method GET
147+
$models = Invoke-RestMethod -Uri "http://localhost:13305/api/v1/models" -Method GET
148148
Write-Host "`n[OK] Available models:"
149149
$models.data | ForEach-Object { Write-Host " - $($_.id)" }
150150
} catch {

.github/workflows/test_embeddings.yml

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ jobs:
6868
$serverJob = Start-Job -ScriptBlock {
6969
# Workaround for Issue #612: Disable Vulkan cooperative matrix optimization
7070
$env:GGML_VK_DISABLE_COOPMAT = "1"
71-
& lemonade-server serve --host localhost --port 8000 --no-tray 2>&1
71+
& lemonade-server serve --host localhost --port 13305 --no-tray 2>&1
7272
}
7373
Write-Host "Started Lemonade server job with ID: $($serverJob.Id)"
7474
$env:LEMONADE_JOB_ID = $serverJob.Id
@@ -84,7 +84,7 @@ jobs:
8484
$waitTime += 2
8585
8686
try {
87-
$response = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/health" -Method GET -TimeoutSec 5
87+
$response = Invoke-RestMethod -Uri "http://localhost:13305/api/v1/health" -Method GET -TimeoutSec 5
8888
Write-Host "[OK] Lemonade server is ready"
8989
Write-Host "Health response: $($response | ConvertTo-Json -Compress)"
9090
$serverReady = $true
@@ -115,7 +115,7 @@ jobs:
115115
Write-Host "Pulling nomic-embed-text-v2-moe-GGUF..."
116116
try {
117117
$body = @{ model_name = "nomic-embed-text-v2-moe-GGUF" } | ConvertTo-Json
118-
$response = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/pull" `
118+
$response = Invoke-RestMethod -Uri "http://localhost:13305/api/v1/pull" `
119119
-Method POST -ContentType "application/json" -Body $body -TimeoutSec 600
120120
Write-Host " [OK] Model pull initiated"
121121
} catch {
@@ -130,7 +130,7 @@ jobs:
130130
} | ConvertTo-Json
131131
132132
Write-Host "Loading model: nomic-embed-text-v2-moe-GGUF"
133-
$loadResponse = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/load" `
133+
$loadResponse = Invoke-RestMethod -Uri "http://localhost:13305/api/v1/load" `
134134
-Method POST -Body $loadRequest -ContentType "application/json" -TimeoutSec 60
135135
Write-Host "[OK] Model loaded successfully: $($loadResponse | ConvertTo-Json -Compress)"
136136
} catch {
@@ -147,7 +147,7 @@ jobs:
147147
148148
# Verify model is available
149149
try {
150-
$models = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/models" -Method GET
150+
$models = Invoke-RestMethod -Uri "http://localhost:13305/api/v1/models" -Method GET
151151
Write-Host "`n[OK] Available models:"
152152
$models.data | ForEach-Object { Write-Host " - $($_.id)" }
153153
} catch {
@@ -157,7 +157,7 @@ jobs:
157157
# Verify server is still responding before embeddings test
158158
Write-Host "`n=== Verifying Server Health ==="
159159
try {
160-
$health = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/health" -Method GET -TimeoutSec 10
160+
$health = Invoke-RestMethod -Uri "http://localhost:13305/api/v1/health" -Method GET -TimeoutSec 10
161161
Write-Host "[OK] Server responding: $($health | ConvertTo-Json -Compress)"
162162
} catch {
163163
Write-Host "[ERROR] Server health check failed: $($_.Exception.Message)"
@@ -179,7 +179,7 @@ jobs:
179179
try {
180180
$testBody = @{ input = @("test embedding"); model = "nomic-embed-text-v2-moe-GGUF" } | ConvertTo-Json
181181
# Use localhost consistently and increased timeout for first embedding request
182-
$response = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/embeddings" `
182+
$response = Invoke-RestMethod -Uri "http://localhost:13305/api/v1/embeddings" `
183183
-Method POST -ContentType "application/json" -Body $testBody -TimeoutSec 300
184184
Write-Host "[OK] Embedding model verified successfully"
185185
$modelReady = $true

.github/workflows/test_gaia_cli_linux.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,9 @@ jobs:
138138
echo "=== Listing Available Models ==="
139139
curl -s http://localhost:8000/api/v1/models | jq '.' || echo "Could not list models"
140140
141+
# Python lemonade-server-dev runs on port 8000; tell GAIA CLI where to connect
142+
export LEMONADE_BASE_URL=http://localhost:8000/api/v1
143+
141144
echo "=== Testing Core GAIA CLI Commands with Lemonade ==="
142145
143146
# Test chat command with Qwen model (should now work with Lemonade)
@@ -191,7 +194,8 @@ jobs:
191194
echo "Testing LemonadeClient API with running server"
192195
193196
# Run the lemonade client integration tests (skip hybrid NPU test - no NPU on Linux)
194-
GAIA_TEST_MODEL="Qwen3-0.6B-GGUF" python -m pytest tests/test_lemonade_client.py -vs --tb=short -k "Integration and not hybrid" || LEMONADE_TEST_EXIT=$?
197+
# LEMONADE_PORT=8000: lemonade-server-dev always binds to 8000 (no --port flag)
198+
LEMONADE_PORT=8000 GAIA_TEST_MODEL="Qwen3-0.6B-GGUF" python -m pytest tests/test_lemonade_client.py -vs --tb=short -k "Integration and not hybrid" || LEMONADE_TEST_EXIT=$?
195199
196200
if [ "${LEMONADE_TEST_EXIT:-0}" -eq 0 ]; then
197201
echo "✅ Lemonade client integration tests passed successfully!"

.github/workflows/test_gaia_cli_windows.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ jobs:
9292
# Start the server in the background as a process (not PowerShell job)
9393
Write-Host "Starting lemonade-server in background..."
9494
# Start the server as a background process
95-
$serverProcess = Start-Process -FilePath "lemonade-server" -ArgumentList "serve", "--no-tray" -PassThru -WindowStyle Hidden
95+
$serverProcess = Start-Process -FilePath "lemonade-server" -ArgumentList "serve", "--no-tray", "--port", "13305" -PassThru -WindowStyle Hidden
9696
Write-Host "Started lemonade-server process with ID: $($serverProcess.Id)"
9797
9898
# Wait for server to start up
@@ -106,7 +106,7 @@ jobs:
106106
$waitTime += 2
107107
108108
try {
109-
$response = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/health" -Method GET -TimeoutSec 5
109+
$response = Invoke-RestMethod -Uri "http://localhost:13305/api/v1/health" -Method GET -TimeoutSec 5
110110
Write-Host "Server is ready and responding to health checks"
111111
$serverReady = $true
112112
} catch {

.github/workflows/test_lemonade_server.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -53,11 +53,11 @@ jobs:
5353
run: |
5454
try {
5555
# Start server and load model (all in one session)
56-
.\installer\scripts\start-lemonade.ps1 -ModelName "Qwen3-4B-Instruct-2507-GGUF" -Port 8000 -CtxSize 32768 -InitWaitTime 10
56+
.\installer\scripts\start-lemonade.ps1 -ModelName "Qwen3-4B-Instruct-2507-GGUF" -Port 13305 -CtxSize 32768 -InitWaitTime 10
5757
5858
# Verify health endpoint
5959
Write-Host "=== Verifying Health Endpoint ==="
60-
$health = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/health" -Method GET -TimeoutSec 10
60+
$health = Invoke-RestMethod -Uri "http://localhost:13305/api/v1/health" -Method GET -TimeoutSec 10
6161
Write-Host "Health response: $($health | ConvertTo-Json -Compress)"
6262
6363
if ($health.status -ne "ok") {
@@ -93,7 +93,7 @@ jobs:
9393
max_tokens = 10
9494
} | ConvertTo-Json
9595
96-
$completion = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/completions" `
96+
$completion = Invoke-RestMethod -Uri "http://localhost:13305/api/v1/completions" `
9797
-Method POST -ContentType "application/json" -Body $testBody -TimeoutSec 30
9898
9999
Write-Host "[OK] Completion successful"

.github/workflows/test_rag.yml

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ jobs:
110110
$serverJob = Start-Job -ScriptBlock {
111111
# Workaround for Issue #612: Disable Vulkan cooperative matrix optimization
112112
$env:GGML_VK_DISABLE_COOPMAT = "1"
113-
& lemonade-server serve --host localhost --port 8000 --ctx-size 8192 --no-tray 2>&1
113+
& lemonade-server serve --host localhost --port 13305 --ctx-size 8192 --no-tray 2>&1
114114
}
115115
Write-Host "Started Lemonade server job with ID: $($serverJob.Id)"
116116
$env:LEMONADE_JOB_ID = $serverJob.Id
@@ -126,7 +126,7 @@ jobs:
126126
$waitTime += 2
127127
128128
try {
129-
$response = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/health" -Method GET -TimeoutSec 5
129+
$response = Invoke-RestMethod -Uri "http://localhost:13305/api/v1/health" -Method GET -TimeoutSec 5
130130
Write-Host "[OK] Lemonade server is ready"
131131
Write-Host "Health response: $($response | ConvertTo-Json -Compress)"
132132
$serverReady = $true
@@ -159,7 +159,7 @@ jobs:
159159
Write-Host "Pulling Qwen3-4B-Instruct-2507-GGUF..."
160160
try {
161161
$body = @{ model_name = "Qwen3-4B-Instruct-2507-GGUF" } | ConvertTo-Json
162-
$response = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/pull" `
162+
$response = Invoke-RestMethod -Uri "http://localhost:13305/api/v1/pull" `
163163
-Method POST -ContentType "application/json" -Body $body -TimeoutSec 600
164164
Write-Host " [OK] Qwen3-4B-Instruct-2507-GGUF pull initiated"
165165
} catch {
@@ -170,7 +170,7 @@ jobs:
170170
Write-Host "Pulling nomic-embed-text-v2-moe-GGUF..."
171171
try {
172172
$body = @{ model_name = "nomic-embed-text-v2-moe-GGUF" } | ConvertTo-Json
173-
$response = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/pull" `
173+
$response = Invoke-RestMethod -Uri "http://localhost:13305/api/v1/pull" `
174174
-Method POST -ContentType "application/json" -Body $body -TimeoutSec 600
175175
Write-Host " [OK] nomic-embed-text-v2-moe-GGUF pull initiated"
176176
} catch {
@@ -181,7 +181,7 @@ jobs:
181181
Write-Host "Pulling Qwen3-VL-4B-Instruct-GGUF..."
182182
try {
183183
$body = @{ model_name = "Qwen3-VL-4B-Instruct-GGUF" } | ConvertTo-Json
184-
$response = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/pull" `
184+
$response = Invoke-RestMethod -Uri "http://localhost:13305/api/v1/pull" `
185185
-Method POST -ContentType "application/json" -Body $body -TimeoutSec 1200
186186
Write-Host " [OK] Qwen3-VL-4B-Instruct-GGUF pull initiated"
187187
} catch {
@@ -196,7 +196,7 @@ jobs:
196196
} | ConvertTo-Json
197197
198198
Write-Host "Loading model: nomic-embed-text-v2-moe-GGUF"
199-
$loadResponse = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/load" `
199+
$loadResponse = Invoke-RestMethod -Uri "http://localhost:13305/api/v1/load" `
200200
-Method POST -Body $loadRequest -ContentType "application/json" -TimeoutSec 60
201201
Write-Host "[OK] Model loaded successfully: $($loadResponse | ConvertTo-Json -Compress)"
202202
} catch {
@@ -213,7 +213,7 @@ jobs:
213213
214214
# Verify models
215215
try {
216-
$models = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/models" -Method GET
216+
$models = Invoke-RestMethod -Uri "http://localhost:13305/api/v1/models" -Method GET
217217
Write-Host "`n[OK] Available models:"
218218
$models.data | ForEach-Object { Write-Host " - $($_.id)" }
219219
} catch {
@@ -223,7 +223,7 @@ jobs:
223223
# Verify server is still responding before embeddings test
224224
Write-Host "`n=== Verifying Server Health ==="
225225
try {
226-
$health = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/health" -Method GET -TimeoutSec 10
226+
$health = Invoke-RestMethod -Uri "http://localhost:13305/api/v1/health" -Method GET -TimeoutSec 10
227227
Write-Host "[OK] Server responding: $($health | ConvertTo-Json -Compress)"
228228
} catch {
229229
Write-Host "[ERROR] Server health check failed: $($_.Exception.Message)"
@@ -245,7 +245,7 @@ jobs:
245245
try {
246246
$testBody = @{ input = @("test embedding"); model = "nomic-embed-text-v2-moe-GGUF" } | ConvertTo-Json
247247
# Use localhost consistently and increased timeout for first embedding request
248-
$response = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/embeddings" `
248+
$response = Invoke-RestMethod -Uri "http://localhost:13305/api/v1/embeddings" `
249249
-Method POST -ContentType "application/json" -Body $testBody -TimeoutSec 300
250250
Write-Host "[OK] Embedding model verified successfully"
251251
$modelReady = $true

cpp/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,23 +28,23 @@ Included demos:
2828

2929
The agent connects to an OpenAI-compatible LLM server at `http://localhost:8000/api/v1` by default. The reference backend is [Lemonade Server](https://github.com/lemonade-sdk/lemonade), which runs models locally on AMD hardware.
3030

31-
Download and install Lemonade Server v10.0.0, then start it:
31+
Download and install Lemonade Server v10.2.0, then start it:
3232

3333
**Windows:**
3434
```powershell
3535
# Download and run the MSI installer
36-
curl -L -o lemonade-server-minimal.msi https://github.com/lemonade-sdk/lemonade/releases/download/v10.0.0/lemonade-server-minimal.msi
36+
curl -L -o lemonade-server-minimal.msi https://github.com/lemonade-sdk/lemonade/releases/download/v10.2.0/lemonade-server-minimal.msi
3737
msiexec /i lemonade-server-minimal.msi
3838
```
3939

4040
**Linux:**
4141
```bash
4242
# Download and install the .deb package
43-
curl -L -o lemonade-server_10.0.0_amd64.deb https://github.com/lemonade-sdk/lemonade/releases/download/v10.0.0/lemonade-server_10.0.0_amd64.deb
44-
sudo dpkg -i lemonade-server_10.0.0_amd64.deb
43+
curl -L -o lemonade-server_10.2.0_amd64.deb https://github.com/lemonade-sdk/lemonade/releases/download/v10.2.0/lemonade-server_10.2.0_amd64.deb
44+
sudo dpkg -i lemonade-server_10.2.0_amd64.deb
4545
```
4646

47-
Or download directly from the [Lemonade v10.0.0 release page](https://github.com/lemonade-sdk/lemonade/releases/tag/v10.0.0).
47+
Or download directly from the [Lemonade v10.2.0 release page](https://github.com/lemonade-sdk/lemonade/releases/tag/v10.2.0).
4848

4949
After installation, start the server:
5050
```bash

0 commit comments

Comments
 (0)