You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Severity: Medium Category: Installer / Lifecycle Platform: Linux (systemd user services), Windows/WSL2 (same systemd path); macOS likely affected via launchd (not directly tested here) Confidence: Confirmed on Linux/WSL2
Description
After bash install.sh --all --non-interactive completes successfully, the running dream-host-agent.service is still the OLD process from before the install: same PID, same start time, with /proc/<pid>/cwd → /home/rosenrot/dream-server (deleted) confirming it's reading the unlinked previous-install inode. The newly-rewritten code on disk is unused until the user manually runs systemctl --user restart dream-host-agent.service.
This is not caused by any specific PR in the current open stack — it appears to be pre-existing behavior on Light-Heart-Labs/DreamServer@c0600ca. But it MASKS the runtime testability of every PR that touches host-agent routes, which currently includes Light-Heart-Labs#893, Light-Heart-Labs#900, Light-Heart-Labs#905, Light-Heart-Labs#906, Light-Heart-Labs#907, Light-Heart-Labs#908. Filing it as a standalone issue because it cost me ~30 minutes of confused QA before I caught it, and it will cost the same to anyone else doing runtime validation of host-agent PRs.
Affected File(s)
dream-server/installers/phases/11-services.sh — final phase that puts the host-agent binary in place; missing a service-restart step.
dream-server/installers/phases/12-finalize.sh — alternative location for the restart hook.
~/.config/systemd/user/dream-host-agent.service — the systemd user unit (already in place from a previous install on test systems).
Root Cause
The installer copies the new dream-host-agent.py into ~/dream-server/bin/ during phase 11 (or wherever the service binary is laid down), but never invokes systemctl --user restart dream-host-agent.service. The service is registered as a long-lived daemon, so:
sudo rm -rf ~/dream-server unlinks the directory inode but doesn't kill the process — the kernel keeps the inode alive as long as a process holds an open reference (the running host-agent's CWD).
bash install.sh --all --non-interactive creates a NEW ~/dream-server directory (new inode) at the same path, populated with the new code.
The systemd-managed host-agent process is unchanged. Its CWD still points to the OLD (deleted) inode. Its open bin/dream-host-agent.py file descriptor still points to the OLD code.
New routes added by recent PRs are unreachable because the OLD process running OLD code doesn't know them.
Evidence
Immediately after a successful bash install.sh --all --non-interactive on a system with a previously-running host-agent:
$ systemctl --user status dream-host-agent.service
● dream-host-agent.service - DreamServer Host Agent
Loaded: loaded (/home/rosenrot/.config/systemd/user/dream-host-agent.service; enabled; preset: enabled)
Active: active (running) since Sat 2026-04-11 03:07:28 +03; 21h ago
Main PID: 75759 (python3)
Note: 21h uptime — that's the PID from BEFORE the install, not a new one.
The cwd link confirms it's holding a deleted inode. Any new host-agent route added by a recent PR is unreachable. Example with PR Light-Heart-Labs#893's cancel:
Host-agent journal shows the OLD process returning 404 because it doesn't know the route:
"POST /v1/model/download/cancel HTTP/1.1" 404 -
After systemctl --user restart dream-host-agent.service:
$ ps -o pid,lstart,cmd -p $(pgrep -f dream-host-agent.py)
PID STARTED CMD
353285 Sun Apr 12 00:11 2026 /usr/bin/python3 /home/rosenrot/dream-server/bin/dream-host-agent.py
$ ls -la /proc/353285/cwd
lrwxrwxrwx ... /proc/353285/cwd -> /home/rosenrot/DreamServer # ← new, no "(deleted)"
$ curl -X POST -H "Authorization: Bearer $KEY" \
http://127.0.0.1:3002/api/models/download/cancel
{"status":"no_download"} # ← 200, works
$ journalctl --user -u dream-host-agent.service -n 5
"POST /v1/model/download/cancel HTTP/1.1" 200 - # ← new route reachable
Same exact dashboard-api call. Difference: the daemon was restarted to pick up the new binary.
Platform Analysis
Linux (systemd user services): Confirmed affected. This is the primary install path for native Linux installs.
Windows/WSL2: Confirmed affected — WSL2 inherits the systemd user-service path.
macOS (launchd): Likely affected by the same class of bug. macOS uses launchctl bootload/bootout to manage the host-agent plist. PR fix(macos-installer): dynamic launchd PATH + extensions-lib source Light-Heart-Labs/DreamServer#899 modifies installers/macos/install-macos.sh's launchctl handling but I could not directly verify the restart-after-rewrite behavior on Apple hardware. Worth checking — if the macOS installer also fails to restart the launchd unit after rewriting the binary, the same zombie pattern applies there.
Reproduction
Have dream-host-agent.service (systemd user unit) already running from a previous install: systemctl --user status dream-host-agent.service shows active (running).
Any new host-agent route is unreachable. Verify with curl -X POST -H "Authorization: Bearer $DASHBOARD_API_KEY" http://127.0.0.1:3002/api/models/download/cancel. With the OLD daemon, response is {"detail":"Not found"}. After systemctl --user restart dream-host-agent.service, response is {"status":"no_download"} (200).
For end users: every host-agent enhancement that lands in a release will be effectively dead code until they restart the service or reboot. For QA: every reviewer who tries to validate a host-agent PR at runtime will see the OLD behavior unless they know to restart the daemon explicitly.
Suggested Approach
Add to dream-server/installers/phases/11-services.sh (after the binary is in place) or dream-server/installers/phases/12-finalize.sh:
# Restart the host-agent so it loads the new binary, not the inode of a deleted previous install.if systemctl --user is-enabled dream-host-agent.service >/dev/null 2>&1;then
log "Restarting dream-host-agent.service to load the new binary..."
systemctl --user restart dream-host-agent.service \
|| warn "host-agent restart failed (non-fatal)"elifcommand -v launchctl >/dev/null 2>&1&& [ -f"$HOME/Library/LaunchAgents/com.dreamserver.host-agent.plist" ];then
log "Reloading dream-host-agent launchd unit..."
launchctl bootout "gui/$(id -u)""$HOME/Library/LaunchAgents/com.dreamserver.host-agent.plist"2>/dev/null ||true
launchctl bootstrap "gui/$(id -u)""$HOME/Library/LaunchAgents/com.dreamserver.host-agent.plist" \
|| warn "host-agent launchd reload failed (non-fatal)"fi
Make sure the restart happens after the binary is on disk, so the new process picks up the new code.
Bonus observation
On host-agent restart the new process logs:
WARNING: Agent is listening on all interfaces. Set DREAM_AGENT_BIND=127.0.0.1 in .env to restrict.
i.e. the host-agent is on 0.0.0.0:7710 by default after a fresh install. The 0.0.0.0 bind is intentional (was deliberately moved off 127.0.0.1 to fix Light-Heart-Labs/DreamServer#752 — host agent unreachable from Docker containers), but the warning suggests the installer isn't setting DREAM_AGENT_BIND=127.0.0.1 in .env as the safer default for environments where the container reachability isn't required. Possibly worth filing as a separate "secure default" issue, or wiring up the install flow so users get a sane default and an opt-out for container reachability.
Bug Report: installer doesn't restart dream-host-agent.service after rewriting bin/dream-host-agent.py — leaves zombie reading deleted inode
Severity: Medium
Category: Installer / Lifecycle
Platform: Linux (systemd user services), Windows/WSL2 (same systemd path); macOS likely affected via launchd (not directly tested here)
Confidence: Confirmed on Linux/WSL2
Description
After
bash install.sh --all --non-interactivecompletes successfully, the runningdream-host-agent.serviceis still the OLD process from before the install: same PID, same start time, with/proc/<pid>/cwd → /home/rosenrot/dream-server (deleted)confirming it's reading the unlinked previous-install inode. The newly-rewritten code on disk is unused until the user manually runssystemctl --user restart dream-host-agent.service.This is not caused by any specific PR in the current open stack — it appears to be pre-existing behavior on
Light-Heart-Labs/DreamServer@c0600ca. But it MASKS the runtime testability of every PR that touches host-agent routes, which currently includes Light-Heart-Labs#893, Light-Heart-Labs#900, Light-Heart-Labs#905, Light-Heart-Labs#906, Light-Heart-Labs#907, Light-Heart-Labs#908. Filing it as a standalone issue because it cost me ~30 minutes of confused QA before I caught it, and it will cost the same to anyone else doing runtime validation of host-agent PRs.Affected File(s)
dream-server/installers/phases/11-services.sh— final phase that puts the host-agent binary in place; missing a service-restart step.dream-server/installers/phases/12-finalize.sh— alternative location for the restart hook.~/.config/systemd/user/dream-host-agent.service— the systemd user unit (already in place from a previous install on test systems).Root Cause
The installer copies the new
dream-host-agent.pyinto~/dream-server/bin/during phase 11 (or wherever the service binary is laid down), but never invokessystemctl --user restart dream-host-agent.service. The service is registered as a long-lived daemon, so:sudo rm -rf ~/dream-serverunlinks the directory inode but doesn't kill the process — the kernel keeps the inode alive as long as a process holds an open reference (the running host-agent's CWD).bash install.sh --all --non-interactivecreates a NEW~/dream-serverdirectory (new inode) at the same path, populated with the new code.bin/dream-host-agent.pyfile descriptor still points to the OLD code.Evidence
Immediately after a successful
bash install.sh --all --non-interactiveon a system with a previously-running host-agent:Note: 21h uptime — that's the PID from BEFORE the install, not a new one.
The cwd link confirms it's holding a deleted inode. Any new host-agent route added by a recent PR is unreachable. Example with PR Light-Heart-Labs#893's cancel:
Host-agent journal shows the OLD process returning 404 because it doesn't know the route:
After
systemctl --user restart dream-host-agent.service:Same exact dashboard-api call. Difference: the daemon was restarted to pick up the new binary.
Platform Analysis
launchctl bootload/bootoutto manage the host-agent plist. PR fix(macos-installer): dynamic launchd PATH + extensions-lib source Light-Heart-Labs/DreamServer#899 modifiesinstallers/macos/install-macos.sh's launchctl handling but I could not directly verify the restart-after-rewrite behavior on Apple hardware. Worth checking — if the macOS installer also fails to restart the launchd unit after rewriting the binary, the same zombie pattern applies there.Reproduction
dream-host-agent.service(systemd user unit) already running from a previous install:systemctl --user status dream-host-agent.serviceshows active (running).sudo rm -rf ~/dream-serverbash install.sh --all --non-interactive— install completes, reaches phase 13.systemctl --user status dream-host-agent.service. The PID is the SAME as before the install. Start time is the same.sudo ls -la /proc/$(pgrep -f dream-host-agent.py)/cwdshows... -> /home/rosenrot/dream-server (deleted).curl -X POST -H "Authorization: Bearer $DASHBOARD_API_KEY" http://127.0.0.1:3002/api/models/download/cancel. With the OLD daemon, response is{"detail":"Not found"}. Aftersystemctl --user restart dream-host-agent.service, response is{"status":"no_download"}(200).Impact
PRs Light-Heart-Labs#893 (cancel route), Light-Heart-Labs#905 (hook framework), Light-Heart-Labs#906 (compose-restart error surfacing), Light-Heart-Labs#907 (built-in extensions API), Light-Heart-Labs#908 (.env write via host agent), Light-Heart-Labs#900 (per-part SHA256 verify) all appear broken at runtime until the host-agent is manually restarted, even though their code is correctly merged on disk and the file at
~/dream-server/bin/dream-host-agent.pyis the new version.For end users: every host-agent enhancement that lands in a release will be effectively dead code until they restart the service or reboot. For QA: every reviewer who tries to validate a host-agent PR at runtime will see the OLD behavior unless they know to restart the daemon explicitly.
Suggested Approach
Add to
dream-server/installers/phases/11-services.sh(after the binary is in place) ordream-server/installers/phases/12-finalize.sh:Make sure the restart happens after the binary is on disk, so the new process picks up the new code.
Bonus observation
On host-agent restart the new process logs:
i.e. the host-agent is on
0.0.0.0:7710by default after a fresh install. The 0.0.0.0 bind is intentional (was deliberately moved off 127.0.0.1 to fixLight-Heart-Labs/DreamServer#752— host agent unreachable from Docker containers), but the warning suggests the installer isn't settingDREAM_AGENT_BIND=127.0.0.1in.envas the safer default for environments where the container reachability isn't required. Possibly worth filing as a separate "secure default" issue, or wiring up the install flow so users get a sane default and an opt-out for container reachability.Cross-references
Light-Heart-Labs/DreamServer#752(closed) — original reason the host-agent is on 0.0.0.0. Context for the bonus observation, not a regression.Filed during full-stack integration test of open PR stack Light-Heart-Labs#893–909 on
Light-Heart-Labs/DreamServer@c0600ca3. Environment: WSL2 / Ubuntu 24.04 / systemd user-mode services / NVIDIA RTX 3070 Laptop / Tier 1.