Skip to content

Commit 9cb2bb8

Browse files
committed
fix(daemon): auto-respawn on config mismatch
1 parent 5411e6a commit 9cb2bb8

8 files changed

Lines changed: 362 additions & 85 deletions

File tree

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1111
- `seeklink get PATH:LINE -C N` prints a grep-style context window around a search hit, returning `N` lines before and after the requested line while preserving direct filesystem reads and path-escape protection.
1212
- `seeklink search --json` and `seeklink status --json` emit stable machine-readable stdout for agents that should not scrape the human text format.
1313

14+
### Fixed
15+
- `seeklink search` and `seeklink index` now auto-restart a stale daemon when its vault, embedder, or reranker config no longer matches the caller, avoiding repeated cold-start fallbacks after switching vaults or model settings.
16+
1417
## [0.3.2] - 2026-04-23
1518

1619
Repository cleanup pass. No code changes affecting runtime behavior

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ seeklink search "machine learning"
117117
# Warm reranker-disabled path: ~10ms per query.
118118
```
119119

120-
The daemon stays resident across terminal sessions until you `kill` it or restart. `seeklink search` and `seeklink index` auto-spawn it when missing; `seeklink status` is always cold-start (it only reads SQLite stats, no model load) and `seeklink get` is a direct filesystem read (no daemon involved either).
120+
The daemon stays resident across terminal sessions until you `kill` it, restart, or switch the default vault/model config. `seeklink search` and `seeklink index` auto-spawn it when missing and auto-restart it if a stale daemon is bound to the wrong vault or model settings; `seeklink status` is always cold-start (it only reads SQLite stats, no model load) and `seeklink get` is a direct filesystem read (no daemon involved either).
121121

122122
## For agents
123123

@@ -158,7 +158,7 @@ Options:
158158

159159
Starts a Unix-socket daemon that keeps the embedding model (and reranker, if enabled) resident in memory. First query after startup takes ~2s for model warmup; warm queries return in ~1-2s with the reranker on (default) or ~10ms with `SEEKLINK_RERANKER_MODEL=""`.
160160

161-
**You almost never run this directly.** `seeklink search` and `seeklink index` auto-spawn a daemon on cold machines when `--vault` is not passed. `seeklink status` is always cold-start (no model load). `seeklink get` is a direct filesystem read (no daemon). The daemon uses `SEEKLINK_VAULT` (or cwd) as its vault and never auto-exits — kill it with `kill` or restart your machine.
161+
**You almost never run this directly.** `seeklink search` and `seeklink index` auto-spawn a daemon on cold machines when `--vault` is not passed. If an existing daemon was started for a different vault or model config, the CLI shuts it down and starts a matching one. `seeklink status` is always cold-start (no model load). `seeklink get` is a direct filesystem read (no daemon). The daemon uses `SEEKLINK_VAULT` (or cwd) as its vault.
162162

163163
Passing `--vault` always uses cold-start instead of the daemon, because the daemon binds to a single vault at startup.
164164

TODOS.md

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -48,14 +48,6 @@ indexed files have drifted on disk. The daemon path does not propagate
4848
those warnings back to clients. Adding a `warnings` field to the daemon
4949
JSON response would let `cli_client` surface them in the same shape.
5050

51-
### Daemon auto-respawn on config mismatch
52-
`cli_client.call()` refuses to reuse a daemon bound to a different vault
53-
or started with a different embedder / reranker (correctness), but falls
54-
back to cold-start on every subsequent CLI call after a switch until the
55-
user manually kills the stale daemon. Add a `shutdown` command to the
56-
daemon protocol so the client can shut down and respawn on mismatch,
57-
keeping the auto-spawn workflow intact across vault / model switches.
58-
5951
### Multi-vault daemon support
6052
The daemon binds to a single socket (`~/.rhizome/seeklink.sock`) regardless
6153
of vault. For multiple vaults to run concurrent daemons, hash the vault

llms.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ seeklink get PATH:LINE -l N # read window around a hit
4242
seeklink get PATH:LINE -C N # read N lines before/after a hit
4343
```
4444

45-
Set `SEEKLINK_VAULT=<path>` once to omit `--vault` on every call and route through the resident daemon (first call after boot: ~2s; warm: ~1-2s with reranker, ~10ms without).
45+
Set `SEEKLINK_VAULT=<path>` once to omit `--vault` on every call and route through the resident daemon (first call after boot: ~2s; warm: ~1-2s with reranker, ~10ms without). If that env/model config changes, `search` and `index` auto-restart a stale daemon instead of silently serving the old vault.
4646

4747
### Output contract
4848

@@ -79,5 +79,5 @@ No other codes.
7979
### Common failure modes
8080

8181
- Empty results on a fresh vault → index not built yet. Run `seeklink index --vault PATH`.
82-
- Daemon won't auto-spawn → `--vault` was passed (cold-start mandatory in that case), or another daemon is bound to a different vault. Run `status` to see `vault` / `embedder` / `reranker` and respawn if needed.
82+
- Daemon won't auto-spawn → `--vault` was passed, which intentionally forces cold-start. Without `--vault`, `search` / `index` should auto-spawn and auto-restart stale daemons when `vault` / `embedder` / `reranker` no longer match.
8383
- Line numbers look wrong → file was edited after indexing. Re-index. `status` prints a freshness warning on cold-start.

seeklink/cli_client.py

Lines changed: 129 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,10 @@
77
fall back to a cold-start in-process execution if they prefer.
88
99
Vault-binding guard: the daemon is single-vault (bound at startup via
10-
SEEKLINK_VAULT or cwd). If a caller passes ``expected_vault`` to
11-
``call()``, the client first probes the daemon's reported vault via a
12-
``status`` request and refuses to reuse a daemon bound to a different
13-
vault — preventing a stale daemon from silently serving (or mutating)
14-
the wrong database when the user changes SEEKLINK_VAULT or cwd.
10+
SEEKLINK_VAULT or cwd). If a caller passes expected config to ``call()``,
11+
the client first probes the daemon's reported status. On mismatch it asks
12+
the stale daemon to shut down, waits for the socket to clear, and then
13+
auto-spawns a daemon under the caller's current env/cwd.
1514
"""
1615

1716
from __future__ import annotations
@@ -33,6 +32,7 @@
3332
# Deferred until multi-vault becomes a real use case.
3433
SOCKET_PATH = Path.home() / ".rhizome" / "seeklink.sock"
3534
SPAWN_WAIT_SECONDS = 60.0 # cold start includes model load, give it time
35+
SHUTDOWN_WAIT_SECONDS = 10.0
3636

3737

3838
def call(
@@ -46,21 +46,19 @@ def call(
4646
"""Send a command to the daemon. Auto-spawns daemon if unreachable.
4747
4848
If any ``expected_*`` is provided, probes the daemon's reported
49-
status first and returns a failure response if the daemon's vault
50-
or model config does not match what the caller expects. Callers
51-
should pass these whenever the intended config is inferred from
52-
env/cwd rather than an explicit flag — this prevents a stale
53-
daemon from:
49+
status first and restarts the daemon if its vault or model config
50+
does not match what the caller expects. Callers should pass these
51+
whenever the intended config is inferred from env/cwd rather than
52+
an explicit flag — this prevents a stale daemon from:
5453
5554
- serving or mutating the wrong vault's database (vault mismatch), or
5655
- returning results computed with the wrong embedder/reranker model
5756
(config mismatch), which can silently corrupt rankings and, if
5857
the embedder's vector width changes, make ``search_vec`` fail.
5958
60-
On mismatch, callers are expected to fall back to an in-process
61-
cold-start (which will pick up the current env). A follow-up that
62-
auto-respawns the daemon instead of falling back is tracked in
63-
TODOS.md.
59+
On mismatch, this function requests daemon shutdown and retries
60+
through the normal spawn path. If restart fails, callers can still
61+
fall back to an in-process cold-start.
6462
6563
Returns the daemon's JSON response as a dict. Never raises — on
6664
failure returns ``{"ok": False, "error": "..."}``.
@@ -76,62 +74,14 @@ def call(
7674
return probe # daemon unreachable / spawn failed
7775
result = probe.get("result") or {}
7876

79-
if expected_vault is not None:
80-
daemon_vault_raw = result.get("vault")
81-
if daemon_vault_raw is None:
82-
return {"ok": False, "error": "daemon status returned no vault"}
83-
try:
84-
daemon_vault = Path(daemon_vault_raw).resolve()
85-
want_vault = expected_vault.resolve()
86-
except OSError as e:
87-
return {"ok": False, "error": f"vault resolution failed: {e}"}
88-
if daemon_vault != want_vault:
89-
return {
90-
"ok": False,
91-
"error": (
92-
f"daemon is bound to vault {daemon_vault}, but caller "
93-
f"expects {want_vault} — restart the daemon or run "
94-
f"with --vault"
95-
),
96-
}
97-
98-
if expected_embedder is not None:
99-
daemon_embedder = result.get("embedder")
100-
if daemon_embedder != expected_embedder:
101-
return {
102-
"ok": False,
103-
"error": (
104-
f"daemon embedder is {daemon_embedder!r}, caller "
105-
f"expects {expected_embedder!r} — restart the daemon"
106-
),
107-
}
108-
109-
if expected_reranker is not None:
110-
daemon_reranker = result.get("reranker")
111-
# Accept the daemon's self-disabled state even if we expected a
112-
# real model name. `run_daemon()` downgrades the reranker to
113-
# "disabled" at warmup time on platforms where mlx_lm cannot
114-
# load (Linux, Intel macOS). Rejecting that would break the
115-
# daemon-first workflow on those supported setups. But if the
116-
# caller explicitly asked for "disabled" (via empty
117-
# `SEEKLINK_RERANKER_MODEL`), a running daemon with a real
118-
# reranker IS a mismatch — the user asked for raw RRF scores
119-
# and would silently get reranked ones.
120-
is_mismatch = (
121-
daemon_reranker != expected_reranker
122-
and not (
123-
expected_reranker != "disabled"
124-
and daemon_reranker == "disabled"
125-
)
126-
)
127-
if is_mismatch:
128-
return {
129-
"ok": False,
130-
"error": (
131-
f"daemon reranker is {daemon_reranker!r}, caller "
132-
f"expects {expected_reranker!r} — restart the daemon"
133-
),
134-
}
77+
mismatch = _config_mismatch_error(
78+
result,
79+
expected_vault=expected_vault,
80+
expected_embedder=expected_embedder,
81+
expected_reranker=expected_reranker,
82+
)
83+
if mismatch is not None:
84+
return _restart_and_retry(cmd, args, mismatch)
13585

13686
# All checks passed — send the real command without re-spawning.
13787
try:
@@ -142,6 +92,98 @@ def call(
14292
return _call_once_with_spawn(cmd, args)
14393

14494

95+
def _config_mismatch_error(
96+
status: dict[str, Any],
97+
*,
98+
expected_vault: Path | None,
99+
expected_embedder: str | None,
100+
expected_reranker: str | None,
101+
) -> str | None:
102+
"""Return a human-readable mismatch reason, or None if config matches."""
103+
if expected_vault is not None:
104+
daemon_vault_raw = status.get("vault")
105+
if daemon_vault_raw is None:
106+
return "daemon status returned no vault"
107+
try:
108+
daemon_vault = Path(daemon_vault_raw).resolve()
109+
want_vault = expected_vault.resolve()
110+
except OSError as e:
111+
return f"vault resolution failed: {e}"
112+
if daemon_vault != want_vault:
113+
return (
114+
f"daemon is bound to vault {daemon_vault}, but caller "
115+
f"expects {want_vault}"
116+
)
117+
118+
if expected_embedder is not None:
119+
daemon_embedder = status.get("embedder")
120+
if daemon_embedder != expected_embedder:
121+
return (
122+
f"daemon embedder is {daemon_embedder!r}, caller "
123+
f"expects {expected_embedder!r}"
124+
)
125+
126+
if expected_reranker is not None:
127+
daemon_reranker = status.get("reranker")
128+
# Accept the daemon's self-disabled state even if we expected a
129+
# real model name. `run_daemon()` downgrades the reranker to
130+
# "disabled" at warmup time on platforms where mlx_lm cannot
131+
# load (Linux, Intel macOS). Rejecting that would break the
132+
# daemon-first workflow on those supported setups. But if the
133+
# caller explicitly asked for "disabled" (via empty
134+
# `SEEKLINK_RERANKER_MODEL`), a running daemon with a real
135+
# reranker IS a mismatch — the user asked for raw RRF scores
136+
# and would silently get reranked ones.
137+
is_mismatch = (
138+
daemon_reranker != expected_reranker
139+
and not (
140+
expected_reranker != "disabled"
141+
and daemon_reranker == "disabled"
142+
)
143+
)
144+
if is_mismatch:
145+
return (
146+
f"daemon reranker is {daemon_reranker!r}, caller "
147+
f"expects {expected_reranker!r}"
148+
)
149+
150+
return None
151+
152+
153+
def _restart_and_retry(
154+
cmd: str, args: dict[str, Any], mismatch: str
155+
) -> dict[str, Any]:
156+
logger.info("Daemon config mismatch; restarting: %s", mismatch)
157+
shutdown = _shutdown_daemon()
158+
if not shutdown.get("ok"):
159+
return {
160+
"ok": False,
161+
"error": (
162+
f"{mismatch}; failed to shut down stale daemon: "
163+
f"{shutdown.get('error', 'unknown error')}"
164+
),
165+
}
166+
167+
if not _wait_for_socket_shutdown(SHUTDOWN_WAIT_SECONDS):
168+
return {
169+
"ok": False,
170+
"error": (
171+
f"{mismatch}; stale daemon did not stop within "
172+
f"{SHUTDOWN_WAIT_SECONDS}s"
173+
),
174+
}
175+
176+
return _call_once_with_spawn(cmd, args)
177+
178+
179+
def _shutdown_daemon() -> dict[str, Any]:
180+
"""Ask the currently running daemon to exit. Never spawns a daemon."""
181+
try:
182+
return _connect_and_send("shutdown", {})
183+
except Exception as e:
184+
return {"ok": False, "error": str(e)}
185+
186+
145187
def _call_once_with_spawn(cmd: str, args: dict[str, Any]) -> dict[str, Any]:
146188
"""Try the daemon, spawn + retry once if unreachable."""
147189
try:
@@ -231,3 +273,20 @@ def _wait_for_socket(timeout: float) -> bool:
231273
pass
232274
time.sleep(0.2)
233275
return False
276+
277+
278+
def _wait_for_socket_shutdown(timeout: float) -> bool:
279+
"""Wait until the socket is gone or no longer accepts connections."""
280+
deadline = time.time() + timeout
281+
while time.time() < deadline:
282+
if not SOCKET_PATH.exists():
283+
return True
284+
try:
285+
probe = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
286+
probe.settimeout(0.5)
287+
probe.connect(str(SOCKET_PATH))
288+
probe.close()
289+
except (ConnectionRefusedError, OSError):
290+
return True
291+
time.sleep(0.1)
292+
return False

seeklink/daemon.py

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
length-prefixed JSON response before closing the connection.
1212
1313
Request schema:
14-
{"cmd": "search" | "status" | "index", "args": {...}}
14+
{"cmd": "search" | "status" | "index" | "shutdown", "args": {...}}
1515
1616
Response schema:
1717
{"ok": true, "result": ...} on success
@@ -27,7 +27,7 @@
2727
import socket
2828
import sys
2929
from pathlib import Path
30-
from typing import Any
30+
from typing import Any, Callable
3131

3232
logger = logging.getLogger(__name__)
3333

@@ -133,7 +133,16 @@ def _shutdown(signum: int, _frame: object) -> None:
133133
except OSError:
134134
break # socket closed during shutdown
135135
try:
136-
_handle_connection(conn, db, embedder, reranker, vault_root)
136+
_handle_connection(
137+
conn,
138+
db,
139+
embedder,
140+
reranker,
141+
vault_root,
142+
request_shutdown=lambda: shutdown_requested.__setitem__(
143+
"flag", True
144+
),
145+
)
137146
except Exception:
138147
logger.exception("Error handling connection")
139148
finally:
@@ -166,6 +175,7 @@ def _handle_connection(
166175
embedder: Any,
167176
reranker: Any,
168177
vault_root: Path,
178+
request_shutdown: Callable[[], None] | None = None,
169179
) -> None:
170180
"""Handle a single client connection: read request, execute, send response."""
171181
from seeklink.ingest import ingest_file, ingest_vault
@@ -183,6 +193,7 @@ def _handle_connection(
183193

184194
cmd = req.get("cmd")
185195
args = req.get("args") or {}
196+
should_shutdown = False
186197

187198
try:
188199
if cmd == "search":
@@ -257,11 +268,17 @@ def _handle_connection(
257268
stats = ingest_vault(db, vault_root, embedder)
258269
response = {"ok": True, "result": stats}
259270

271+
elif cmd == "shutdown":
272+
response = {"ok": True, "result": {"status": "shutting_down"}}
273+
should_shutdown = True
274+
260275
else:
261276
_send_error(conn, f"unknown command: {cmd}")
262277
return
263278

264279
_send_framed(conn, json.dumps(response).encode("utf-8"))
280+
if should_shutdown and request_shutdown is not None:
281+
request_shutdown()
265282

266283
except Exception as e:
267284
logger.exception("Command %r failed", cmd)

0 commit comments

Comments
 (0)