Skip to content

Commit 8204be0

Browse files
authored
fix(core): shape genie-ai-runtime chat requests + compact oversized prompts (#74)
Closes #73. GenieClaw's web chat through `/api/chat` used to return `{"error":"chat: genie-ai-runtime 0: "}` because the OpenAI-compatible adapter sent a multi-kB prompt (tool manifest + memory + history) into a Jetson runtime whose Qwen3 context auto-capped to ~327 tokens under memory pressure. The runtime then crashed; the client side saw a socket-drop and surfaced an empty status. Fix has three coordinated parts: 1. **Request profile** (`RequestProfile { Generic, GenieAiRuntime }` in `openai_compat.rs`). Generic keeps today's wire format byte-for-byte so the llama.cpp path is unaffected. GenieAiRuntime sends `model: "jetson-llm"` (matches the model name the Jetson runtime registers) and `think: false` (matches the working direct-curl baseline). The `#[serde(skip_serializing_if = "Option::is_none")]` on the new `think` field means llama.cpp never sees it. 2. **Body compaction for the Jetson profile.** When the serialized body exceeds 4 KB, replace messages with a single user-role turn containing a concise system blurb plus the latest user content. Pairs with the genie-ai-runtime PR #81 prompt-size guard (which started returning HTTP 400 instead of crashing) — compaction sidesteps the rejection entirely. 3. **Honest error surfacing.** When the runtime closes the socket before sending HTTP status (`status == 0 && body.is_empty()`), report `{backend} closed connection before HTTP status (request_bytes=N)` instead of `genie-ai-runtime 0: `. Symmetric guard on the SSE path via `if !headers_done` + initial `status = 0`. 4. **Service unit tightening.** `genie-ai-runtime.service` now starts `jetson-llm-server` with `--int8-kv` + `-c 2048` (via the `GENIEPOD_AI_RUNTIME_CONTEXT=2048` env knob). INT8 KV roughly doubles fittable context vs FP16 so the Jetson runtime reliably gets the 2048 tokens GenieClaw's web prompt is sized for. Test in `tool_dispatch_test.rs` pins both invariants so they can't silently regress. Three unit tests cover the compaction surface: Generic-unchanged, GenieAiRuntime-compacts-when-large (asserts model=jetson-llm, think=false, content preserved, history dropped), and the fallback-to-latest-non-system path for the case where the last message isn't a user role. Acceptance follow-ups deferred to separate issues: - **Web-chat history preservation**: current compaction replaces ALL prior turns with the latest user message + system blurb. Web chat returns a working response but doesn't remember previous turns. Should evolve toward "system blurb + recent N user/assistant pairs" once the runtime's context capacity is more reliable. - **Dashboard `llm Connection refused (os error 111)` (issue #73 AC #3)**: that row comes from `genie-health`'s separate probe path, not the chat client. Needs a separate issue against `genie-health`'s probe target / URL. - **Static `GENIE_RUNTIME_MAX_BODY_BYTES = 4 KB`**: ideally derived from a runtime-capacity probe rather than hardcoded; defensible static value for an alpha. All 5 CI checks green on `5c55f62` (fmt, clippy, test, aarch64 cross-compile, `--no-default-features`).
1 parent 2c945a9 commit 8204be0

4 files changed

Lines changed: 281 additions & 29 deletions

File tree

crates/genie-core/src/llm/genie_ai_runtime.rs

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
use anyhow::Result;
22
use async_trait::async_trait;
33

4-
use super::openai_compat::OpenAiCompatClient;
4+
use super::openai_compat::{OpenAiCompatClient, RequestProfile};
55
use super::{LlmBackendClient, Message, ResponseFormat};
66

77
/// Adapter for the `genie-ai-runtime` OpenAI-compatible chat API surface.
@@ -12,13 +12,22 @@ pub struct GenieAiRuntimeBackend {
1212
impl GenieAiRuntimeBackend {
1313
pub fn new(host: &str, port: u16) -> Self {
1414
Self {
15-
inner: OpenAiCompatClient::new("genie-ai-runtime", host, port),
15+
inner: OpenAiCompatClient::new_with_profile(
16+
"genie-ai-runtime",
17+
host,
18+
port,
19+
RequestProfile::genie_ai_runtime(),
20+
),
1621
}
1722
}
1823

1924
pub fn from_url(url: &str) -> Self {
2025
Self {
21-
inner: OpenAiCompatClient::from_url("genie-ai-runtime", url),
26+
inner: OpenAiCompatClient::from_url_with_profile(
27+
"genie-ai-runtime",
28+
url,
29+
RequestProfile::genie_ai_runtime(),
30+
),
2231
}
2332
}
2433
}

0 commit comments

Comments
 (0)