docs+config: close out #32 non-functional cleanup items

ai-hpc · ai-hpc · commit 339f84f62130 · 2026-05-16T22:11:58.000+10:00
Functional LLM backend abstraction landed across #35, #38, #39, #40, #43. This commit closes the strictly-textual remainder of the original issue: - deploy/config/geniepod.toml: add commented `backend = "..."` examples under [services.llm] with both canonical and hyphenated alias forms. - deploy/config/geniepod.dev.toml: same, with a note that dev machines typically stay on llama.cpp. - README.md: replace "transitional llama.cpp" framing with the current reality — llama.cpp is the default backend; genie-ai-runtime is selectable per-deployment via [services.llm].backend. Updated the How-It-Works diagram comment and the How-It-Fits-Together prose. - ARCHITECTURE.md: rewrite the Current Transitional Adapters row for the LLM client to reflect that the boundary IS now resolved via the LlmClient facade; update the process-topology diagram comment; update the Refactor Direction step to name "LLM backends" instead of "llama.cpp". - crates/genie-core/Cargo.toml: declare `llama-cpp` and `genie-ai-runtime` features (both default-on, byte-identical builds today). Comment notes the per-backend `#[cfg]` gating is incremental and will land alongside the CI matrix from #34 so a --no-default-features regression is caught automatically rather than silently bit-rotting. Direct push to main per maintainer call — the changes are documentation and commented config examples; the feature flags are no-op declarations that don't change build behavior. Refs #32.
diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md
@@ -78,7 +78,7 @@ The current repo still contains pragmatic adapters used to ship on Jetson now.
 
 | Current adapter | Long-term replacement | Notes |
 | --- | --- | --- |
-| `llama.cpp` OpenAI-compatible client | `genie-ai-runtime` client | Keep the client boundary narrow and model/runtime assumptions explicit. |
+| `llama.cpp` OpenAI-compatible client (default) | `genie-ai-runtime` client (opt-in) | Both backends ship behind the `LlmClient` facade; per-deployment selection via `[services.llm].backend` in `geniepod.toml`. Backend identity surfaces in `/api/health`, startup logs, and `genie-ctl status`. |
 | Home Assistant provider | `genie-home-runtime` MCP/API client | Keep HA-specific behavior behind `ha/` and tools/home boundaries. |
 | Actuation safety in `genie-core` | final safety in `genie-home-runtime` | Keep current safety as an agent-side guard and confirmation layer. |
 | `genie-api` dashboard | application layer | Keep it operational and lightweight; avoid making it the long-term product app. |
@@ -116,7 +116,8 @@ Code in memory, voice, prompt, and channels should not learn Home Assistant inte
 Current Jetson deployment:
 
 ```text
-llama-server (:8080)
+LLM backend (:8080)         # llama-server by default; genie-ai-runtime if
+                            # [services.llm].backend = "genie_ai_runtime"
         ^
         |
 genie-core (:3000) <---- genie-ctl
@@ -223,7 +224,7 @@ Long-term direction:
 The clean architecture path is incremental:
 
 1. Make boundary language consistent in docs and config.
-2. Keep Home Assistant and llama.cpp behind narrow adapter traits.
+2. Keep Home Assistant and LLM backends behind narrow adapter traits (LLM side resolved via the `LlmClient` facade in `crates/genie-core/src/llm/`).
 3. Move physical actuation authority downward into `genie-home-runtime` when it exists.
 4. Move Jetson model-server specialization downward into `genie-ai-runtime`.
 5. Keep GenieClaw focused on voice, memory, skills, tools, channels, and household interaction.
diff --git a/README.md b/README.md
@@ -29,9 +29,9 @@ Voice in, voice out, controls Home Assistant, no cloud.**
    └────────┘   └────────┘   └──────┬───────┘   └───────┘
                                     │
                        memory ◄─────┼─────► local LLM
-                       (SQLite)     │       (llama.cpp today;
-                                    │        genie-ai-runtime
-                                    │        replacing it)
+                       (SQLite)     │       (llama.cpp by default,
+                                    │        genie-ai-runtime opt-in
+                                    │        via [services.llm].backend)
                                     ▼
                           Home Assistant
                           (rate-limited, audited)
@@ -65,7 +65,7 @@ This repo is the Rust agent runtime for a very specific product shape:
 - a local household memory system
 - safe handoff to a home-control runtime
 - transitional Home Assistant support while `genie-home-runtime` is not yet split out
-- transitional `llama.cpp` support while `genie-ai-runtime` is not yet split out
+- pluggable local LLM backend (`llama.cpp` default; `genie-ai-runtime` selectable via `[services.llm].backend = "genie_ai_runtime"`)
 - a privacy-first and security-first system
 - a memory-footprint-conscious runtime built for constrained edge hardware
 - a household trust model that exposes redacted posture, not raw config files
@@ -149,8 +149,11 @@ logic, response style, channels, and skill routing.
 
 At a high level:
 
-1. Today, `llama.cpp` provides the local model server. Longer term,
-   `genie-ai-runtime` should provide the Jetson-only inference service.
+1. The local model server is `llama.cpp` by default; the
+   `genie-ai-runtime` Jetson-tuned runtime is selectable per-deployment
+   via `[services.llm].backend = "genie_ai_runtime"` in `geniepod.toml`.
+   Backend identity flows through `LlmClient::backend_name()` into
+   logs, `/api/health`, and `genie-ctl status` for operator visibility.
 2. `genie-core` handles prompts, tool calls, memory, chat, and voice orchestration.
 3. Today, Home Assistant can provide device state and service execution. Longer term,
    `genie-home-runtime` should provide that boundary and the final actuation safety layer.
diff --git a/crates/genie-core/Cargo.toml b/crates/genie-core/Cargo.toml
@@ -14,8 +14,15 @@ name = "genie-core"
 path = "src/main.rs"
 
 [features]
-default = ["telegram"]
+default = ["telegram", "llama-cpp", "genie-ai-runtime"]
 telegram = []
+# LLM backend feature surface (declared per issue #32 closeout).
+# Both default-on so today's builds are byte-identical.
+# Per-backend `#[cfg]` gating of the modules and constructors will land
+# alongside the CI matrix from issue #34, so a `--no-default-features`
+# regression is caught automatically rather than silently bit-rotting.
+llama-cpp = []
+genie-ai-runtime = []
 
 [dependencies]
 genie-common = { workspace = true }
diff --git a/deploy/config/geniepod.dev.toml b/deploy/config/geniepod.dev.toml
@@ -85,6 +85,8 @@ systemd_unit = "genie-core.service"
 [services.llm]
 url = "http://127.0.0.1:8080/health"
 systemd_unit = "genie-llm.service"
+# backend = "llama_cpp"             # Default. Local llama.cpp `--server` on :8080.
+# backend = "genie_ai_runtime"      # Jetson-tuned runtime (dev machines typically stay on llama.cpp).
 
 # Uncomment to enable Home Assistant integration during development:
 # [services.homeassistant]
diff --git a/deploy/config/geniepod.toml b/deploy/config/geniepod.toml
@@ -199,6 +199,12 @@ systemd_unit = "genie-core.service"
 [services.llm]
 url = "http://127.0.0.1:8080/health"
 systemd_unit = "genie-llm.service"
+# backend = "llama_cpp"             # Default. Local llama.cpp `--server` on :8080.
+# backend = "genie_ai_runtime"      # Jetson-tuned runtime (see GeniePod/genie-ai-runtime).
+                                    # Set systemd_unit = "genie-ai-runtime.service"
+                                    # and ensure the binary + a compatible model are
+                                    # installed before flipping this.
+                                    # Accepts hyphenated aliases too: "llama-cpp" / "genie-ai-runtime".
 
 # Uncomment to enable Home Assistant integration:
 # [services.homeassistant]