fix(models): platform-aware activation + download cancel#893
Conversation
213342c to
6eec8af
Compare
Lightheartdevs
left a comment
There was a problem hiding this comment.
Five bugs + a feature in one PR is larger than I'd normally like, but every piece is tightly coupled to model activation/download so I'll let it stand. Each bug is legitimate:
Bug 1 (Critical, macOS): the silent-success activation bug is particularly nasty — _compose_restart_llama_server "succeeded" because replicas:0 in docker-compose.macos.yml made the compose call a no-op, while the native llama-server process kept the old model loaded. User sees "success," asks a question, gets the old model's response. Worst kind of bug.
Fix is correct: the if gpu_backend == "apple": branch before the container/compose paths stops the native process via PID file (with ps-verify to avoid PID reuse — good paranoia), launches fresh via subprocess.Popen with Metal args matching bootstrap-upgrade.sh:431-438. Shared _launch_native_llama_server() helper is the right extraction.
Bug 2: writing LLAMA_SERVER_IMAGE to .env on the apple backend is meaningless and could corrupt the env. Skip is correct.
Bug 3: localhost → 127.0.0.1 on the host-native health check. Consistent with #977 (dreamforge/perplexica healthcheck) and #975 (native llama-server binds). Good.
Bug 4: stop + up -d over restart — same pattern as #935. restart doesn't re-read .env; up -d does. The named-volume analysis (lemonade-cache, lemonade-llama, lemonade-recipe survive) correctly addresses the "does this nuke the cached binary" concern.
Bug 5: path traversal protection on activate matches the existing delete handler pattern — good consistency.
Cancel feature: the subprocess.Popen + threading.Event + TOCTOU-safe local-ref pattern is correct. _model_download_cancel.wait(5) replacing time.sleep(5) for fast cancel is a nice touch. Dashboard UI button is scoped correctly (only renders when helpers.cancelDownload is available).
Merge order (per author): last, after #906, #905, #900, #908, #902. That's a 5-deep dependency chain — please bundle these into a merge train or the leaf PRs will keep needing rebases. Ship.
|
Cross-PR coordination note from a deeper re-read. The new However, it conflicts with the direction of draft PR #975, which changes all native launches (macOS installer, CLI, host agent, bootstrap-upgrade, Windows) from Options:
Either works; just needs coordination between you and yasin so one doesn't silently undo the other. Not blocking — still approving. |
Addresses five bugs in the model management system and adds a download cancel feature. Builds on the model action infrastructure from PR Light-Heart-Labs#886. - **macOS native llama-server restart**: Adds a gpu_backend == "apple" branch in _do_model_activate that stops the existing native process via PID file (SIGTERM → 10s wait → SIGKILL, with ps-verify to avoid PID reuse accidents), then re-launches via subprocess.Popen with Metal args. Previously _compose_restart_llama_server ran docker commands that silently succeeded on macOS (replicas: 0 in docker-compose.macos.yml) while the native process kept running the old model. - **LLAMA_SERVER_IMAGE apple guard**: Reads gpu_backend before updating .env and skips LLAMA_SERVER_IMAGE on apple. Previously the image was written unconditionally — meaningless on macOS where llama-server is a native binary, not a container. - **Health check uses 127.0.0.1**: On IPv6-enabled Linux hosts, localhost resolves to ::1 first, but Docker binds to 127.0.0.1 only. Previously the health check timed out after 5 minutes and triggered a false rollback even when the model loaded successfully. - **_compose_restart_llama_server uses resolve_compose_flags()**: Replaced the inline .compose-flags read with resolve_compose_flags(), which falls back to running resolve-compose-stack.sh dynamically when the cached file is absent. Also changed both AMD and NVIDIA paths to stop + up -d (was: restart for AMD, start for no-compose-flags NVIDIA) so that updated .env values are always picked up by the new container. - **Path traversal protection in activate handler**: Added .resolve() + .is_relative_to() check on the GGUF target path, matching the existing delete handler pattern. - Adds POST /v1/model/download/cancel host agent endpoint - Uses threading.Event (_model_download_cancel) + _model_download_proc global. Download thread checks the cancel flag at the start of each part loop and in the _poll_progress thread (which kills the curl Popen). On cancel: kills curl, cleans up .part file, writes cancelled status. - Changed subprocess.run() to subprocess.Popen() + proc.wait() so the curl process can be killed from the cancel handler or poll thread. - Cancel handler captures a local reference to _model_download_proc to avoid TOCTOU race where the download thread nulls it mid-check. - Dashboard-api proxies via POST /api/models/download/cancel. - Frontend useDownloadProgress exposes cancelDownload(). Models.jsx renders a red Cancel button in the progress bar. - Also handles 'cancelled' status in useDownloadProgress (was only 'failed' and 'error'). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three non-blocking improvements from Critique Guardian review: - Extract _launch_native_llama_server() helper to remove ~40 lines of duplicate code between the forward and rollback paths in _do_model_activate. Both paths now call the same function, which reads GGUF_FILE / CTX_SIZE / LLAMA_REASONING from .env after the caller has written the update. - Replace _time.sleep(5) in retry loop with _model_download_cancel.wait(5) so a cancel request is honored immediately during the retry delay instead of waiting up to 5 seconds. - Hoist import time to the top of _do_model_activate instead of scattering inline imports inside conditional blocks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the empty catch block in cancelDownload with console.error so failed cancel requests are visible in devtools instead of disappearing silently. Consistent with project error-handling rules. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6eec8af to
7dc775f
Compare
Summary
Addresses five bugs in the model management system and adds a download cancel feature. Builds on the model action infrastructure already in main.
What — Bugs fixed
_compose_restart_llama_serverwhich silently succeeded (replicas: 0 in docker-compose.macos.yml) while the native llama-server process kept the old model loadedLLAMA_SERVER_IMAGEwritten to.enveven onapplebackend — meaningless write that could corrupt the env on macOSlocalhost→ resolved to::1on IPv6-enabled hosts — 5-minute false rollback even when model loaded successfully_compose_restart_llama_serverread.compose-flagsdirectly (file may not exist) and useddocker compose restartfor AMD — restart does not re-read.envPlus: new download cancel feature — users can abort in-progress downloads.
How — Implementation
dream-server/bin/dream-host-agent.pyBug 1 — macOS native restart:
Added
if gpu_backend == "apple":branch in_do_model_activatebefore the_in_container/_compose_restartpaths. Stops the existing native process via PID file (ps-verify to avoid PID reuse accidents → SIGTERM → 10s wait → SIGKILL), then re-launches viasubprocess.Popenwith Metal args matchingscripts/bootstrap-upgrade.sh:431-438. Rollback path mirrors the forward path. Extracted shared_launch_native_llama_server()helper to avoid duplicating the ~20-line launch block in both directions.Bug 2 — apple guard on LLAMA_SERVER_IMAGE:
Reads
gpu_backendfrom.envbefore the update block and skipsLLAMA_SERVER_IMAGEwhengpu_backend == "apple".Bug 3 — 127.0.0.1 vs localhost:
Changed
llama_host = "localhost"tollama_host = "127.0.0.1"on the host-native health check path. Docker binds to127.0.0.1explicitly; on IPv6-enabled Linux hosts,localhostresolves to::1first.Bug 4 —
_compose_restart_llama_serverrework:.compose-flagsread withresolve_compose_flags(), which already falls back to runningresolve-compose-stack.shwhen the cache file is absentstop + up -d(was:restartfor AMD,docker startfor no-compose-flags NVIDIA) —up -dre-reads.env;restartandstartdo notlemonade-cache,lemonade-llama,lemonade-recipe) survivestop + up -dso there is no Lemonade binary re-cache penaltyBug 5 — path traversal:
Added
.resolve() + .is_relative_to()check on the GGUF target path in_do_model_activate, matching the existing delete handler pattern.Cancel feature:
_model_download_proc: subprocess.Popen | Noneand_model_download_cancel: threading.Eventglobalssubprocess.run()tosubprocess.Popen()+proc.wait()in the download loop so the process can be killed from outside the thread_poll_progressthread checks_model_download_canceland kills the active curl proc_model_download_cancel.wait(5)replacestime.sleep(5)in retry delay so cancel is immediatePOST /v1/model/download/cancelhost agent endpoint_model_download_procto avoid TOCTOU racedream-server/extensions/services/dashboard-api/routers/models.pyPOST /api/models/download/cancelproxying to host agent via existing_call_agent_modelhelperdream-server/extensions/services/dashboard/src/hooks/useDownloadProgress.jscancelDownload()async function exposed from the hookcancelledstatus (alongside existingfailed/error)console.errorinstead of swallowing silentlydream-server/extensions/services/dashboard/src/pages/Models.jsxDownloadProgressBarcomponent, rendered only whenhelpers.cancelDownloadis availableTesting
Automated (all pass):
py_compile— cleanruff check— cleanManual (Apple Silicon, local install):
.llama-server.pidupdated{"status": "no_download"}Platform impact
127.0.0.1health check (IPv6 fix),stop+up -dre-reads.envstop+up -dre-reads.env; named volumes preserve cached binary_recreate_llama_serverpath unchanged🤖 Generated with Claude Code