Skip to content

Commit d695fe5

Browse files
yasinBursaliclaude
andcommitted
fix(host-agent): surface docker failures in _compose_restart_llama_server
_compose_restart_llama_server called subprocess.run four+ times for docker commands (compose stop/up, docker restart/stop/start) without inspecting returncode. Docker-layer failures (permission denied, missing compose file, daemon errors) were silently swallowed: _do_model_activate proceeded into the 5-minute health-check polling loop and only reported a generic "Health check failed — rolled back" with no indication of the real cause. Route all docker calls through a nested _run helper that captures stderr, checks returncode, and raises RuntimeError with the failing command + stderr tail on non-zero. The caller at _do_model_activate already wraps the path in `except Exception` and will now surface the docker error immediately. Native-host path only — Windows/WSL2 uses _recreate_llama_server which has its own returncode handling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent ac881c3 commit d695fe5

File tree

1 file changed

+20
-11
lines changed

1 file changed

+20
-11
lines changed

dream-server/bin/dream-host-agent.py

Lines changed: 20 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1599,31 +1599,40 @@ def _write_lemonade_config(install_dir: Path, gguf_file: str):
15991599
def _compose_restart_llama_server(env: dict):
16001600
"""Restart llama-server via docker compose (host-native path).
16011601
1602-
This is the primary restart strategy for Linux (systemd) and macOS
1603-
(launchd) where the agent runs natively on the host. It mirrors the
1604-
proven pattern from bootstrap-upgrade.sh lines 289-304.
1602+
Primary restart strategy for Linux (systemd) and macOS (launchd) where the
1603+
agent runs natively on the host. Mirrors bootstrap-upgrade.sh lines 289-304.
1604+
1605+
Raises RuntimeError on any docker-layer failure so _do_model_activate can
1606+
surface the error immediately instead of waiting for the health-check loop.
16051607
"""
16061608
gpu_backend = env.get("GPU_BACKEND", "nvidia")
16071609
compose_flags = []
16081610
flags_file = INSTALL_DIR / ".compose-flags"
16091611
if flags_file.exists():
16101612
compose_flags = flags_file.read_text(encoding="utf-8").strip().split()
16111613

1614+
def _run(argv, timeout):
1615+
result = subprocess.run(
1616+
argv, cwd=str(INSTALL_DIR),
1617+
capture_output=True, text=True, timeout=timeout,
1618+
)
1619+
if result.returncode != 0:
1620+
raise RuntimeError(
1621+
f"{' '.join(argv[:3])} failed (exit {result.returncode}): "
1622+
f"{(result.stderr or '').strip()[:300]}"
1623+
)
1624+
16121625
if gpu_backend == "amd":
16131626
# Lemonade: restart preserves cached binary, reads models.ini on boot
16141627
if compose_flags:
1615-
subprocess.run(["docker", "compose"] + compose_flags + ["restart", "llama-server"],
1616-
cwd=str(INSTALL_DIR), capture_output=True, timeout=300)
1628+
_run(["docker", "compose"] + compose_flags + ["restart", "llama-server"], 300)
16171629
else:
1618-
subprocess.run(["docker", "restart", "dream-llama-server"],
1619-
capture_output=True, timeout=300)
1630+
_run(["docker", "restart", "dream-llama-server"], 300)
16201631
else:
16211632
# llama.cpp: recreate to pick up new GGUF_FILE from .env
16221633
if compose_flags:
1623-
subprocess.run(["docker", "compose"] + compose_flags + ["stop", "llama-server"],
1624-
cwd=str(INSTALL_DIR), capture_output=True, timeout=120)
1625-
subprocess.run(["docker", "compose"] + compose_flags + ["up", "-d", "llama-server"],
1626-
cwd=str(INSTALL_DIR), capture_output=True, timeout=300)
1634+
_run(["docker", "compose"] + compose_flags + ["stop", "llama-server"], 120)
1635+
_run(["docker", "compose"] + compose_flags + ["up", "-d", "llama-server"], 300)
16271636
else:
16281637
# No compose flags — cannot use compose. Fall back to
16291638
# inspect-and-recreate, which picks up GGUF_FILE from .env.

0 commit comments

Comments
 (0)