Skip to content

fix(models): platform-aware activation + download cancel#893

Merged
Lightheartdevs merged 4 commits intoLight-Heart-Labs:mainfrom
yasinBursali:fix/model-activate-bugs-v2
Apr 18, 2026
Merged

fix(models): platform-aware activation + download cancel#893
Lightheartdevs merged 4 commits intoLight-Heart-Labs:mainfrom
yasinBursali:fix/model-activate-bugs-v2

Conversation

@yasinBursali
Copy link
Copy Markdown
Contributor

@yasinBursali yasinBursali commented Apr 11, 2026

Merge order: Merge last — depends on #906, #905, #900, #908, and #902.

Summary

Addresses five bugs in the model management system and adds a download cancel feature. Builds on the model action infrastructure already in main.

What — Bugs fixed

# Severity Platform Bug
1 Critical macOS Activation ran _compose_restart_llama_server which silently succeeded (replicas: 0 in docker-compose.macos.yml) while the native llama-server process kept the old model loaded
2 High Linux NVIDIA LLAMA_SERVER_IMAGE written to .env even on apple backend — meaningless write that could corrupt the env on macOS
3 High Linux IPv6 Health check used localhost → resolved to ::1 on IPv6-enabled hosts — 5-minute false rollback even when model loaded successfully
4 Medium All _compose_restart_llama_server read .compose-flags directly (file may not exist) and used docker compose restart for AMD — restart does not re-read .env
5 Medium All No path traversal protection in activate handler — inconsistent with the delete handler which already had it

Plus: new download cancel feature — users can abort in-progress downloads.

How — Implementation

dream-server/bin/dream-host-agent.py

Bug 1 — macOS native restart:
Added if gpu_backend == "apple": branch in _do_model_activate before the _in_container / _compose_restart paths. Stops the existing native process via PID file (ps-verify to avoid PID reuse accidents → SIGTERM → 10s wait → SIGKILL), then re-launches via subprocess.Popen with Metal args matching scripts/bootstrap-upgrade.sh:431-438. Rollback path mirrors the forward path. Extracted shared _launch_native_llama_server() helper to avoid duplicating the ~20-line launch block in both directions.

Bug 2 — apple guard on LLAMA_SERVER_IMAGE:
Reads gpu_backend from .env before the update block and skips LLAMA_SERVER_IMAGE when gpu_backend == "apple".

Bug 3 — 127.0.0.1 vs localhost:
Changed llama_host = "localhost" to llama_host = "127.0.0.1" on the host-native health check path. Docker binds to 127.0.0.1 explicitly; on IPv6-enabled Linux hosts, localhost resolves to ::1 first.

Bug 4 — _compose_restart_llama_server rework:

  • Replaced inline .compose-flags read with resolve_compose_flags(), which already falls back to running resolve-compose-stack.sh when the cache file is absent
  • Changed all Docker restart paths to stop + up -d (was: restart for AMD, docker start for no-compose-flags NVIDIA) — up -d re-reads .env; restart and start do not
  • Named volumes (lemonade-cache, lemonade-llama, lemonade-recipe) survive stop + up -d so there is no Lemonade binary re-cache penalty

Bug 5 — path traversal:
Added .resolve() + .is_relative_to() check on the GGUF target path in _do_model_activate, matching the existing delete handler pattern.

Cancel feature:

  • Added _model_download_proc: subprocess.Popen | None and _model_download_cancel: threading.Event globals
  • Switched subprocess.run() to subprocess.Popen() + proc.wait() in the download loop so the process can be killed from outside the thread
  • The _poll_progress thread checks _model_download_cancel and kills the active curl proc
  • _model_download_cancel.wait(5) replaces time.sleep(5) in retry delay so cancel is immediate
  • Added POST /v1/model/download/cancel host agent endpoint
  • Cancel handler captures a local reference to _model_download_proc to avoid TOCTOU race

dream-server/extensions/services/dashboard-api/routers/models.py

  • Added POST /api/models/download/cancel proxying to host agent via existing _call_agent_model helper

dream-server/extensions/services/dashboard/src/hooks/useDownloadProgress.js

  • Added cancelDownload() async function exposed from the hook
  • Handles cancelled status (alongside existing failed/error)
  • Logs cancel errors with console.error instead of swallowing silently

dream-server/extensions/services/dashboard/src/pages/Models.jsx

  • Added red Cancel button inside DownloadProgressBar component, rendered only when helpers.cancelDownload is available

Testing

Automated (all pass):

  • py_compile — clean
  • ruff check — clean
  • Critique Guardian — APPROVED (all CG observations addressed)

Manual (Apple Silicon, local install):

  • Activated Qwen3.5-2B while Qwen3.5-9B was running → old PID killed, new process launched, health check passed in 1 attempt, .llama-server.pid updated
  • Attempted activate on non-downloaded model → 400, running process untouched
  • Cancel with no active download → {"status": "no_download"}

Platform impact

Platform Impact
macOS Apple Silicon Activation now works — native process managed via PID file
Linux NVIDIA 127.0.0.1 health check (IPv6 fix), stop+up -d re-reads .env
Linux AMD (Lemonade) stop+up -d re-reads .env; named volumes preserve cached binary
Docker Desktop / WSL2 _recreate_llama_server path unchanged

🤖 Generated with Claude Code

This was referenced Apr 11, 2026
Lightheartdevs
Lightheartdevs previously approved these changes Apr 18, 2026
Copy link
Copy Markdown
Collaborator

@Lightheartdevs Lightheartdevs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Five bugs + a feature in one PR is larger than I'd normally like, but every piece is tightly coupled to model activation/download so I'll let it stand. Each bug is legitimate:

Bug 1 (Critical, macOS): the silent-success activation bug is particularly nasty — _compose_restart_llama_server "succeeded" because replicas:0 in docker-compose.macos.yml made the compose call a no-op, while the native llama-server process kept the old model loaded. User sees "success," asks a question, gets the old model's response. Worst kind of bug.

Fix is correct: the if gpu_backend == "apple": branch before the container/compose paths stops the native process via PID file (with ps-verify to avoid PID reuse — good paranoia), launches fresh via subprocess.Popen with Metal args matching bootstrap-upgrade.sh:431-438. Shared _launch_native_llama_server() helper is the right extraction.

Bug 2: writing LLAMA_SERVER_IMAGE to .env on the apple backend is meaningless and could corrupt the env. Skip is correct.

Bug 3: localhost127.0.0.1 on the host-native health check. Consistent with #977 (dreamforge/perplexica healthcheck) and #975 (native llama-server binds). Good.

Bug 4: stop + up -d over restart — same pattern as #935. restart doesn't re-read .env; up -d does. The named-volume analysis (lemonade-cache, lemonade-llama, lemonade-recipe survive) correctly addresses the "does this nuke the cached binary" concern.

Bug 5: path traversal protection on activate matches the existing delete handler pattern — good consistency.

Cancel feature: the subprocess.Popen + threading.Event + TOCTOU-safe local-ref pattern is correct. _model_download_cancel.wait(5) replacing time.sleep(5) for fast cancel is a nice touch. Dashboard UI button is scoped correctly (only renders when helpers.cancelDownload is available).

Merge order (per author): last, after #906, #905, #900, #908, #902. That's a 5-deep dependency chain — please bundle these into a merge train or the leaf PRs will keep needing rebases. Ship.

@Lightheartdevs
Copy link
Copy Markdown
Collaborator

Cross-PR coordination note from a deeper re-read.

The new _launch_native_llama_server helper binds to --host 0.0.0.0 — this preserves the existing behavior in main (both bootstrap-upgrade.sh and install-macos.sh currently bind to 0.0.0.0 on macOS native launches), so it's not a regression vs the current state.

However, it conflicts with the direction of draft PR #975, which changes all native launches (macOS installer, CLI, host agent, bootstrap-upgrade, Windows) from --host 0.0.0.0 to --host 127.0.0.1. If #975's security fix is split out and lands first (as I've recommended in #975's review), this new helper will need to be updated to match — otherwise merging this PR would silently re-introduce the 0.0.0.0 bind for the macOS activation path that #975 just removed.

Options:

  1. If this PR merges first: fix(security): bind llama-server and host agent to loopback #975 needs to add _launch_native_llama_server to its list of native-launch fixes (currently it only covers the 7 call sites that existed when fix(security): bind llama-server and host agent to loopback #975 was branched).
  2. If fix(security): bind llama-server and host agent to loopback #975 merges first: update this PR to use 127.0.0.1 in the new helper.

Either works; just needs coordination between you and yasin so one doesn't silently undo the other. Not blocking — still approving.

yasinBursali and others added 4 commits April 18, 2026 14:33
Addresses five bugs in the model management system and adds a download
cancel feature. Builds on the model action infrastructure from PR Light-Heart-Labs#886.

- **macOS native llama-server restart**: Adds a gpu_backend == "apple"
  branch in _do_model_activate that stops the existing native process via
  PID file (SIGTERM → 10s wait → SIGKILL, with ps-verify to avoid PID
  reuse accidents), then re-launches via subprocess.Popen with Metal args.
  Previously _compose_restart_llama_server ran docker commands that
  silently succeeded on macOS (replicas: 0 in docker-compose.macos.yml)
  while the native process kept running the old model.

- **LLAMA_SERVER_IMAGE apple guard**: Reads gpu_backend before updating
  .env and skips LLAMA_SERVER_IMAGE on apple. Previously the image was
  written unconditionally — meaningless on macOS where llama-server is
  a native binary, not a container.

- **Health check uses 127.0.0.1**: On IPv6-enabled Linux hosts, localhost
  resolves to ::1 first, but Docker binds to 127.0.0.1 only. Previously
  the health check timed out after 5 minutes and triggered a false rollback
  even when the model loaded successfully.

- **_compose_restart_llama_server uses resolve_compose_flags()**: Replaced
  the inline .compose-flags read with resolve_compose_flags(), which falls
  back to running resolve-compose-stack.sh dynamically when the cached file
  is absent. Also changed both AMD and NVIDIA paths to stop + up -d
  (was: restart for AMD, start for no-compose-flags NVIDIA) so that updated
  .env values are always picked up by the new container.

- **Path traversal protection in activate handler**: Added .resolve() +
  .is_relative_to() check on the GGUF target path, matching the existing
  delete handler pattern.

- Adds POST /v1/model/download/cancel host agent endpoint
- Uses threading.Event (_model_download_cancel) + _model_download_proc
  global. Download thread checks the cancel flag at the start of each
  part loop and in the _poll_progress thread (which kills the curl Popen).
  On cancel: kills curl, cleans up .part file, writes cancelled status.
- Changed subprocess.run() to subprocess.Popen() + proc.wait() so the
  curl process can be killed from the cancel handler or poll thread.
- Cancel handler captures a local reference to _model_download_proc to
  avoid TOCTOU race where the download thread nulls it mid-check.
- Dashboard-api proxies via POST /api/models/download/cancel.
- Frontend useDownloadProgress exposes cancelDownload(). Models.jsx
  renders a red Cancel button in the progress bar.
- Also handles 'cancelled' status in useDownloadProgress (was only
  'failed' and 'error').

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three non-blocking improvements from Critique Guardian review:

- Extract _launch_native_llama_server() helper to remove ~40 lines of
  duplicate code between the forward and rollback paths in _do_model_activate.
  Both paths now call the same function, which reads GGUF_FILE / CTX_SIZE /
  LLAMA_REASONING from .env after the caller has written the update.

- Replace _time.sleep(5) in retry loop with _model_download_cancel.wait(5)
  so a cancel request is honored immediately during the retry delay instead
  of waiting up to 5 seconds.

- Hoist import time to the top of _do_model_activate instead of scattering
  inline imports inside conditional blocks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the empty catch block in cancelDownload with console.error so
failed cancel requests are visible in devtools instead of disappearing
silently. Consistent with project error-handling rules.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Lightheartdevs Lightheartdevs force-pushed the fix/model-activate-bugs-v2 branch from 6eec8af to 7dc775f Compare April 18, 2026 18:35
@Lightheartdevs Lightheartdevs merged commit caa8b47 into Light-Heart-Labs:main Apr 18, 2026
27 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants