fix(server): echo persisted recipe_options in the /load response

ianbmacdonald · claude · glm-5.2 · ianbmacdonald · commit 5e58f750a7ea · 2026-06-19T17:30:24.000-04:00
A save_options:true load *replaces* the model's stored recipe options with
the set sent in the request (omitting a key clears it). That contract is
documented but invisible to a client driving the API programmatically: the
/load response reports a bare success, so a caller incrementally tuning a
recipe cannot tell that an option it did not resend was dropped without a
follow-up /models query or reading the API docs.

Echo the model's persisted recipe_options in the /load success response
(both the per-model and collection branches), mirroring the existing
/models representation (model_info_to_json already serializes
recipe_options.to_json()). The replace-on-save semantics are unchanged; the
resulting persisted state is simply made observable. The echoed set is the
persisted one (== /models), not the per-request effective options; this is
documented in docs/api/lemonade.md.

Adds test_012m_load_response_echoes_recipe_options: asserts the response
carries recipe_options and that a subsequent partial save surfaces the
cleared key's absence. Validated on ai3 (RX 7900 XT, llamacpp/vulkan):
recipe-options suite 6/6 pass.

Co-Authored-By: Claude Opus 4.8 (1M context) &lt;noreply@anthropic.com&gt;
Co-Authored-By: glm-5.2 &lt;noreply@zhipuai.cn&gt;
Co-Authored-By: gpt-5.5 &lt;noreply@openai.com&gt;
diff --git a/docs/api/lemonade.md b/docs/api/lemonade.md
@@ -554,11 +554,16 @@ curl -X POST http://localhost:13305/v1/load \
 
 ```json
 {
-  "status":"success",
-  "message":"Loaded model: Qwen3-0.6B-GGUF"
+  "status": "success",
+  "model_name": "Qwen3-0.6B-GGUF",
+  "checkpoint": "Qwen/Qwen3-0.6B-GGUF",
+  "recipe": "llamacpp",
+  "recipe_options": {"ctx_size": 8192}
 }
 ```
 
+`recipe_options` echoes the model's **persisted** recipe options — the same set returned by `GET /v1/models`, i.e. what a future option-free load will recall. When the request set `save_options: true`, this reflects the just-saved set, so a client can confirm what was persisted (including that any key it did not resend was cleared by the replace-on-save behavior of `save_options`). It is the persisted set, not the per-request effective options: runtime options sent without `save_options` are applied for that load but are not reflected here. Collection loads return their own (typically empty) persisted `recipe_options`; per-model options are not forwarded to components.
+
 In case of an error, the status will be `error` and the message will contain the error message.
 
 ## `POST /v1/unload`
diff --git a/src/cpp/server/server.cpp b/src/cpp/server/server.cpp
@@ -3520,7 +3520,12 @@ void Server::handle_load(const httplib::Request& req, httplib::Response& res) {
         LOG(INFO, "Server") << " " << options.to_log_string(false);
         LOG(INFO, "Server") << std::endl;
 
-        // Persist request options to model info if requested
+        // Persist request options to model info if requested. Per the documented
+        // contract, a save *replaces* the model's stored options with the set sent in
+        // this request (omitting a key clears it). That semantics is intentional, but
+        // it is invisible to a client driving the API programmatically — the success
+        // response below echoes the resulting recipe_options so a caller can observe
+        // exactly what is now persisted/effective instead of inferring it from the docs.
         if (save_options) {
             info.recipe_options = options;
             model_manager_->save_model_options(info);
@@ -3539,10 +3544,15 @@ void Server::handle_load(const httplib::Request& req, httplib::Response& res) {
         if (is_collection_recipe(info.recipe) && !info.components.empty()) {
             ensure_collection_loaded(info);
 
+            // Echo recipe_options here too so the /load response shape does not vary
+            // by recipe type (matches the non-collection branch below). Per-model
+            // options are not forwarded to components, so this is the collection's own
+            // persisted set (typically empty).
             nlohmann::json response = {
                 {"status", "success"},
                 {"model_name", model_name},
-                {"recipe", info.recipe}
+                {"recipe", info.recipe},
+                {"recipe_options", info.recipe_options.to_json()}
             };
             res.set_content(response.dump(), "application/json");
         } else {
@@ -3553,12 +3563,18 @@ void Server::handle_load(const httplib::Request& req, httplib::Response& res) {
                                 /*allow_reload_on_option_change=*/true,
                                 pinned_opt);
 
-            // Return success response
+            // Return success response. Echo the model's recipe_options — the persisted
+            // set that a future option-free load will recall — so a client (e.g. an
+            // agent iteratively tuning a recipe) can observe what a save_options request
+            // persisted, including that the replace-on-save contract cleared any key it
+            // did not resend, without a follow-up /models query or reading the API docs.
+            // Mirrors model_info_to_json(), which already serializes recipe_options.
             nlohmann::json response = {
                 {"status", "success"},
                 {"model_name", model_name},
                 {"checkpoint", info.checkpoint()},
-                {"recipe", info.recipe}
+                {"recipe", info.recipe},
+                {"recipe_options", info.recipe_options.to_json()}
             };
             res.set_content(response.dump(), "application/json");
         }
diff --git a/test/server_endpoints.py b/test/server_endpoints.py
@@ -629,6 +629,86 @@ def test_012b_load_reloads_on_option_change(self):
             f"(ctx_size={custom_ctx})"
         )
 
+    def test_012m_load_response_echoes_recipe_options(self):
+        """The POST /load response body echoes the model's persisted recipe_options.
+
+        A client driving the API programmatically (e.g. an agent iteratively tuning a
+        recipe) can then observe what a save_options request persisted directly from the
+        load response — and, because a save replaces the stored options with exactly the
+        set sent, can see that a key it did not resend was cleared — without a follow-up
+        /models query or having to learn the replace-on-save contract from the docs.
+
+        Teeth: against a server that does not echo, the response JSON has no
+        'recipe_options' key, so the first assertion fails."""
+        # This test persists recipe options for the shared test model; reset them
+        # afterward (an empty save_options clears the stored set) so the leaked state
+        # does not bleed into later tests.
+        self.addCleanup(
+            lambda: requests.post(
+                f"{self.base_url}/load",
+                json={"model_name": ENDPOINT_TEST_MODEL, "save_options": True},
+                timeout=TIMEOUT_MODEL_OPERATION,
+            )
+        )
+
+        # 1. Save an option set containing ctx_size; the load response echoes it back.
+        first_ctx = 5120
+        resp = requests.post(
+            f"{self.base_url}/load",
+            json={
+                "model_name": ENDPOINT_TEST_MODEL,
+                "ctx_size": first_ctx,
+                "save_options": True,
+            },
+            timeout=TIMEOUT_MODEL_OPERATION,
+        )
+        self.assertEqual(resp.status_code, 200)
+        body = resp.json()
+        self.assertIn(
+            "recipe_options",
+            body,
+            "POST /load response must echo recipe_options so a client can observe "
+            "the effective/persisted set without a separate /models request",
+        )
+        self.assertEqual(
+            body["recipe_options"].get("ctx_size"),
+            first_ctx,
+            "Echoed recipe_options should reflect the just-saved ctx_size",
+        )
+
+        # 2. Save a DIFFERENT partial set (no ctx_size). The documented contract is that
+        #    a save *replaces* the stored options, so ctx_size is cleared. The response
+        #    must make that observable rather than reporting a bare success.
+        resp2 = requests.post(
+            f"{self.base_url}/load",
+            json={
+                "model_name": ENDPOINT_TEST_MODEL,
+                "llamacpp_args": "--top-k 20",
+                "save_options": True,
+            },
+            timeout=TIMEOUT_MODEL_OPERATION,
+        )
+        self.assertEqual(resp2.status_code, 200)
+        echoed = resp2.json().get("recipe_options", {})
+        self.assertEqual(
+            echoed.get("llamacpp_args"),
+            "--top-k 20",
+            "Echoed recipe_options should reflect the newly-saved llamacpp_args",
+        )
+        # This relies on RecipeOptions.to_json() returning only explicitly-set keys
+        # (not resolving/injecting defaults, unlike to_log_string(resolve_defaults=True)).
+        # If that serialization ever changes to bake in defaults, revisit this assertion.
+        self.assertNotIn(
+            "ctx_size",
+            echoed,
+            "replace-on-save cleared ctx_size; the /load response must surface its "
+            "absence so a client can observe the contract instead of inferring it",
+        )
+        print(
+            "[OK] /load response echoes effective recipe_options "
+            "(replace-on-save is observable)"
+        )
+
     def test_012c_load_noop_when_already_loaded_by_inference(self):
         """Regression test for #1603: /load after an inference-triggered
         auto-load should no-op, not evict and reload the model.