Skip to content

Commit 5e58f75

Browse files
ianbmacdonaldclaudeglm-5.2codex
committed
fix(server): echo persisted recipe_options in the /load response
A save_options:true load *replaces* the model's stored recipe options with the set sent in the request (omitting a key clears it). That contract is documented but invisible to a client driving the API programmatically: the /load response reports a bare success, so a caller incrementally tuning a recipe cannot tell that an option it did not resend was dropped without a follow-up /models query or reading the API docs. Echo the model's persisted recipe_options in the /load success response (both the per-model and collection branches), mirroring the existing /models representation (model_info_to_json already serializes recipe_options.to_json()). The replace-on-save semantics are unchanged; the resulting persisted state is simply made observable. The echoed set is the persisted one (== /models), not the per-request effective options; this is documented in docs/api/lemonade.md. Adds test_012m_load_response_echoes_recipe_options: asserts the response carries recipe_options and that a subsequent partial save surfaces the cleared key's absence. Validated on ai3 (RX 7900 XT, llamacpp/vulkan): recipe-options suite 6/6 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-Authored-By: glm-5.2 <noreply@zhipuai.cn> Co-Authored-By: gpt-5.5 <noreply@openai.com>
1 parent 31cdc92 commit 5e58f75

3 files changed

Lines changed: 107 additions & 6 deletions

File tree

docs/api/lemonade.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -554,11 +554,16 @@ curl -X POST http://localhost:13305/v1/load \
554554

555555
```json
556556
{
557-
"status":"success",
558-
"message":"Loaded model: Qwen3-0.6B-GGUF"
557+
"status": "success",
558+
"model_name": "Qwen3-0.6B-GGUF",
559+
"checkpoint": "Qwen/Qwen3-0.6B-GGUF",
560+
"recipe": "llamacpp",
561+
"recipe_options": {"ctx_size": 8192}
559562
}
560563
```
561564

565+
`recipe_options` echoes the model's **persisted** recipe options — the same set returned by `GET /v1/models`, i.e. what a future option-free load will recall. When the request set `save_options: true`, this reflects the just-saved set, so a client can confirm what was persisted (including that any key it did not resend was cleared by the replace-on-save behavior of `save_options`). It is the persisted set, not the per-request effective options: runtime options sent without `save_options` are applied for that load but are not reflected here. Collection loads return their own (typically empty) persisted `recipe_options`; per-model options are not forwarded to components.
566+
562567
In case of an error, the status will be `error` and the message will contain the error message.
563568

564569
## `POST /v1/unload`

src/cpp/server/server.cpp

Lines changed: 20 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3520,7 +3520,12 @@ void Server::handle_load(const httplib::Request& req, httplib::Response& res) {
35203520
LOG(INFO, "Server") << " " << options.to_log_string(false);
35213521
LOG(INFO, "Server") << std::endl;
35223522

3523-
// Persist request options to model info if requested
3523+
// Persist request options to model info if requested. Per the documented
3524+
// contract, a save *replaces* the model's stored options with the set sent in
3525+
// this request (omitting a key clears it). That semantics is intentional, but
3526+
// it is invisible to a client driving the API programmatically — the success
3527+
// response below echoes the resulting recipe_options so a caller can observe
3528+
// exactly what is now persisted/effective instead of inferring it from the docs.
35243529
if (save_options) {
35253530
info.recipe_options = options;
35263531
model_manager_->save_model_options(info);
@@ -3539,10 +3544,15 @@ void Server::handle_load(const httplib::Request& req, httplib::Response& res) {
35393544
if (is_collection_recipe(info.recipe) && !info.components.empty()) {
35403545
ensure_collection_loaded(info);
35413546

3547+
// Echo recipe_options here too so the /load response shape does not vary
3548+
// by recipe type (matches the non-collection branch below). Per-model
3549+
// options are not forwarded to components, so this is the collection's own
3550+
// persisted set (typically empty).
35423551
nlohmann::json response = {
35433552
{"status", "success"},
35443553
{"model_name", model_name},
3545-
{"recipe", info.recipe}
3554+
{"recipe", info.recipe},
3555+
{"recipe_options", info.recipe_options.to_json()}
35463556
};
35473557
res.set_content(response.dump(), "application/json");
35483558
} else {
@@ -3553,12 +3563,18 @@ void Server::handle_load(const httplib::Request& req, httplib::Response& res) {
35533563
/*allow_reload_on_option_change=*/true,
35543564
pinned_opt);
35553565

3556-
// Return success response
3566+
// Return success response. Echo the model's recipe_options — the persisted
3567+
// set that a future option-free load will recall — so a client (e.g. an
3568+
// agent iteratively tuning a recipe) can observe what a save_options request
3569+
// persisted, including that the replace-on-save contract cleared any key it
3570+
// did not resend, without a follow-up /models query or reading the API docs.
3571+
// Mirrors model_info_to_json(), which already serializes recipe_options.
35573572
nlohmann::json response = {
35583573
{"status", "success"},
35593574
{"model_name", model_name},
35603575
{"checkpoint", info.checkpoint()},
3561-
{"recipe", info.recipe}
3576+
{"recipe", info.recipe},
3577+
{"recipe_options", info.recipe_options.to_json()}
35623578
};
35633579
res.set_content(response.dump(), "application/json");
35643580
}

test/server_endpoints.py

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -629,6 +629,86 @@ def test_012b_load_reloads_on_option_change(self):
629629
f"(ctx_size={custom_ctx})"
630630
)
631631

632+
def test_012m_load_response_echoes_recipe_options(self):
633+
"""The POST /load response body echoes the model's persisted recipe_options.
634+
635+
A client driving the API programmatically (e.g. an agent iteratively tuning a
636+
recipe) can then observe what a save_options request persisted directly from the
637+
load response — and, because a save replaces the stored options with exactly the
638+
set sent, can see that a key it did not resend was cleared — without a follow-up
639+
/models query or having to learn the replace-on-save contract from the docs.
640+
641+
Teeth: against a server that does not echo, the response JSON has no
642+
'recipe_options' key, so the first assertion fails."""
643+
# This test persists recipe options for the shared test model; reset them
644+
# afterward (an empty save_options clears the stored set) so the leaked state
645+
# does not bleed into later tests.
646+
self.addCleanup(
647+
lambda: requests.post(
648+
f"{self.base_url}/load",
649+
json={"model_name": ENDPOINT_TEST_MODEL, "save_options": True},
650+
timeout=TIMEOUT_MODEL_OPERATION,
651+
)
652+
)
653+
654+
# 1. Save an option set containing ctx_size; the load response echoes it back.
655+
first_ctx = 5120
656+
resp = requests.post(
657+
f"{self.base_url}/load",
658+
json={
659+
"model_name": ENDPOINT_TEST_MODEL,
660+
"ctx_size": first_ctx,
661+
"save_options": True,
662+
},
663+
timeout=TIMEOUT_MODEL_OPERATION,
664+
)
665+
self.assertEqual(resp.status_code, 200)
666+
body = resp.json()
667+
self.assertIn(
668+
"recipe_options",
669+
body,
670+
"POST /load response must echo recipe_options so a client can observe "
671+
"the effective/persisted set without a separate /models request",
672+
)
673+
self.assertEqual(
674+
body["recipe_options"].get("ctx_size"),
675+
first_ctx,
676+
"Echoed recipe_options should reflect the just-saved ctx_size",
677+
)
678+
679+
# 2. Save a DIFFERENT partial set (no ctx_size). The documented contract is that
680+
# a save *replaces* the stored options, so ctx_size is cleared. The response
681+
# must make that observable rather than reporting a bare success.
682+
resp2 = requests.post(
683+
f"{self.base_url}/load",
684+
json={
685+
"model_name": ENDPOINT_TEST_MODEL,
686+
"llamacpp_args": "--top-k 20",
687+
"save_options": True,
688+
},
689+
timeout=TIMEOUT_MODEL_OPERATION,
690+
)
691+
self.assertEqual(resp2.status_code, 200)
692+
echoed = resp2.json().get("recipe_options", {})
693+
self.assertEqual(
694+
echoed.get("llamacpp_args"),
695+
"--top-k 20",
696+
"Echoed recipe_options should reflect the newly-saved llamacpp_args",
697+
)
698+
# This relies on RecipeOptions.to_json() returning only explicitly-set keys
699+
# (not resolving/injecting defaults, unlike to_log_string(resolve_defaults=True)).
700+
# If that serialization ever changes to bake in defaults, revisit this assertion.
701+
self.assertNotIn(
702+
"ctx_size",
703+
echoed,
704+
"replace-on-save cleared ctx_size; the /load response must surface its "
705+
"absence so a client can observe the contract instead of inferring it",
706+
)
707+
print(
708+
"[OK] /load response echoes effective recipe_options "
709+
"(replace-on-save is observable)"
710+
)
711+
632712
def test_012c_load_noop_when_already_loaded_by_inference(self):
633713
"""Regression test for #1603: /load after an inference-triggered
634714
auto-load should no-op, not evict and reload the model.

0 commit comments

Comments
 (0)