Skip to content

Commit 9ab2974

Browse files
authored
fix(setup): start genie-ai-runtime before memory-heavy services (#76)
Fixes #75. Issue evidence on Jetson Orin Nano 8 GB shows the runtime's auto-clamp behavior is order-sensitive: same `-c 4096` request fits only ~1.7k context when the full stack is already resident, but fits 4k / 6k / 8k cleanly when `genie-ai-runtime` loads first. This PR pins the right startup order so the runtime claims its KV cache before memory-heavy services occupy DRAM. Three coordinated changes: 1. `Before=genie-whisper.service genie-whisper-warmup.service homeassistant.service genie-core.service` on `genie-ai-runtime.service`. systemd ordering directive that makes the LLM unit's load complete before the other memory-heavy services start. Combined with PR #72's existing `After=genie-ai-runtime.service genie-llm.service` on `genie-core`, the dependency is now bidirectional. `Before=` is a no-op for units that aren't installed on this host, so this is safe for installs that don't ship homeassistant or whisper. 2. `GENIEPOD_AI_RUNTIME_CONTEXT` default bumped from `2048` to `8192` — the largest context the issue verified loads cleanly with `--int8-kv` on Orin Nano 8 GB. The env knob stays settable via systemd drop-in for smaller Jetsons. 3. `deploy/scripts/start_all.sh` reorders the `UNITS=(...)` array so the configured LLM unit + warmup run before `homeassistant`, `genie-whisper`, `genie-whisper-warmup` in the manual lifecycle path too, mirroring the systemd `Before=`. Tests added to `tool_dispatch_test.rs` lock both invariants: - `start_all_uses_configured_llm_backend` asserts `$configured_llm_unit` appears before `homeassistant.service` and `genie-whisper.service` in the `UNITS=` array. - `genie_ai_runtime_service_preserves_model_page_cache` asserts `GENIEPOD_AI_RUNTIME_CONTEXT=8192` and that the new `Before=` clause is present. Compatibility with PR #70 (warm page cache across restart) is preserved — `Before=` only affects boot-time ordering, not `systemctl restart genie-ai-runtime` alone. End-user verified on the same Jetson the issue was filed against. Worth a follow-up: PR #74's `GENIE_RUNTIME_MAX_BODY_BYTES = 4 KB` body-compaction threshold is now leaving performance on the table at the new 8192-token runtime context (the client compacts prompts the runtime could now handle). Right path is to make the threshold a function of `GENIEPOD_AI_RUNTIME_CONTEXT` or probe runtime capacity at connection time. Not blocking this PR. All 7 CI checks green on `c1cae29` (fmt, clippy, test, aarch64 cross-compile, shellcheck, ruff, `--no-default-features`).
1 parent 8204be0 commit 9ab2974

3 files changed

Lines changed: 38 additions & 10 deletions

File tree

crates/genie-core/tests/tool_dispatch_test.rs

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -264,6 +264,24 @@ fn start_all_uses_configured_llm_backend() {
264264
contents.contains("is_warmup_unit") && contents.contains("start --no-block"),
265265
"start_all should queue warmup units without blocking the lifecycle script"
266266
);
267+
let units = contents
268+
.split("UNITS=(")
269+
.nth(1)
270+
.and_then(|s| s.split(")").next())
271+
.expect("start_all should declare ordered units");
272+
let llm_pos = units
273+
.find("\"$configured_llm_unit\"")
274+
.expect("start_all should include the configured LLM unit");
275+
let homeassistant_pos = units
276+
.find("homeassistant.service")
277+
.expect("start_all should include Home Assistant");
278+
let whisper_pos = units
279+
.find("genie-whisper.service")
280+
.expect("start_all should include Whisper");
281+
assert!(
282+
llm_pos < homeassistant_pos && llm_pos < whisper_pos,
283+
"start_all should start the configured LLM before memory-heavy services"
284+
);
267285
}
268286

269287
/// Verify genie-ai-runtime service preserves warm GGUF pages across restarts.
@@ -285,8 +303,14 @@ fn genie_ai_runtime_service_preserves_model_page_cache() {
285303
"genie-ai-runtime.service should use INT8 KV to fit enough context under memory pressure"
286304
);
287305
assert!(
288-
contents.contains("GENIEPOD_AI_RUNTIME_CONTEXT=2048"),
289-
"genie-ai-runtime.service should request the GenieClaw web-chat context size"
306+
contents.contains("GENIEPOD_AI_RUNTIME_CONTEXT=8192"),
307+
"genie-ai-runtime.service should request the Jetson-tested 8k context size"
308+
);
309+
assert!(
310+
contents.contains(
311+
"Before=genie-whisper.service genie-whisper-warmup.service homeassistant.service genie-core.service"
312+
),
313+
"genie-ai-runtime.service should reserve KV cache before memory-heavy services"
290314
);
291315
}
292316

deploy/scripts/start_all.sh

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -118,12 +118,12 @@ configured_llm_unit="$(normalize_unit "$raw_llm_unit")"
118118
configured_warmup_unit="$(warmup_unit_for "$configured_llm_unit")"
119119

120120
UNITS=(
121-
homeassistant.service
122121
genie-audio.service
123-
genie-whisper.service
124-
genie-whisper-warmup.service
125122
"$configured_llm_unit"
126123
"$configured_warmup_unit"
124+
homeassistant.service
125+
genie-whisper.service
126+
genie-whisper-warmup.service
127127
genie-core.service
128128
genie-governor.service
129129
genie-health.service

deploy/systemd/genie-ai-runtime.service

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,11 @@
22
Description=GeniePod AI Runtime (Jetson-tuned LLM, OpenAI-compatible)
33
Documentation=https://github.com/GeniePod/genie-ai-runtime
44
After=network.target
5+
# Claim the LLM KV cache before memory-heavy voice/container services start.
6+
# Jetson testing for issue #75 showed the same `-c 4096` request fitting only
7+
# ~1.7k ctx after the full stack was resident, but fitting 4k/6k/8k ctx when
8+
# genie-ai-runtime loaded first.
9+
Before=genie-whisper.service genie-whisper-warmup.service homeassistant.service genie-core.service
510
ConditionPathExists=/opt/geniepod/bin/jetson-llm-server
611
# Conflicts with genie-llm.service: both bind :8080. systemd will refuse
712
# to start the second one while the first is running, so a misconfigured
@@ -12,17 +17,16 @@ Conflicts=genie-llm.service
1217
Type=simple
1318
# Keep the GGUF in page cache across restarts when the kernel can. Clearing
1419
# VM caches here made every runtime restart cold-load Qwen3 again (issue #69).
15-
# Use INT8 KV so the Jetson service reliably gets enough context for
16-
# GenieClaw's web prompt even under memory pressure. `-c` is still clamped
17-
# by runtime memory budget, but INT8 KV roughly doubles the fitted context
18-
# versus the server's FP16 default.
20+
# Use INT8 KV so the Jetson service can reserve an 8k context on Orin Nano
21+
# when systemd starts it before memory-heavy services. `-c` is still clamped
22+
# by runtime memory budget, so boot/start ordering matters.
1923
ExecStart=/opt/geniepod/bin/jetson-llm-server \
2024
-m ${GENIEPOD_LLM_MODEL} \
2125
-p 8080 \
2226
-c ${GENIEPOD_AI_RUNTIME_CONTEXT} \
2327
--int8-kv
2428
Environment=GENIEPOD_LLM_MODEL=/opt/geniepod/models/Qwen3-4B-Q4_K_M.gguf
25-
Environment=GENIEPOD_AI_RUNTIME_CONTEXT=2048
29+
Environment=GENIEPOD_AI_RUNTIME_CONTEXT=8192
2630
Restart=on-failure
2731
RestartSec=5
2832
TimeoutStartSec=120

0 commit comments

Comments
 (0)