fix(extensions-library): wait for healthy databases in paperless-ngx depends_on#537
Closed
yasinBursali wants to merge 44 commits intoLight-Heart-Labs:resources/devfrom
Closed
Conversation
…validation Five changes to eliminate the support pain we experienced with real users on Strix Halo: 1. Symlink `dream` to /usr/local/bin during install Users had no idea dream-cli existed at ~/dream-server/dream-cli. Now `dream status`, `dream restart perplexica` etc. work immediately. 2. Save compose flags at install time (.compose-flags) Users were manually chaining 5+ compose files to restart a single service. Now dream-cli reads saved flags — no compose knowledge needed. 3. Add `dream repair <service>` command Stops container, nukes volume, recreates, and re-seeds config. Includes Perplexica repair script that sets API key, base URL, model, and marks setupComplete via HTTP API. 4. Post-install validation in phase 13 - Re-runs Perplexica config seed if phase 12 failed silently - Warns AMD users if not in render/video groups (ComfyUI won't work) 5. Dashboard GPU detection — AMD-aware messages PreFlightChecks now uses backend-specific error from API instead of hardcoded "Install NVIDIA drivers." TroubleshootingAssistant includes AMD ROCm solutions alongside NVIDIA. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Invalidate .compose-flags cache in cmd_enable/cmd_disable so extension changes take effect immediately instead of using stale cached flags - Validate .compose-flags content on read (must start with '-f ') and remove corrupt/stale files to fall through to dynamic resolution - Add [y/N] confirmation prompt to dream repair before destroying service volumes (matches existing rollback/preset-restore pattern) - Replace || true with || warn in cmd_repair for visible error reporting - Tighten volume grep from substring match to anchored pattern to prevent matching unrelated services (e.g. dashboard matching dashboard-api) - Add set -euo pipefail to repair-perplexica.sh - Fix shell injection: use os.environ in Python instead of shell variable interpolation inside heredoc (single-quoted delimiter prevents expansion) - Use lib/python-cmd.sh for Python detection matching phase 12 pattern - Guard .compose-flags write in 11-services.sh against full-disk failure - Redirect stderr to LOG_FILE in 13-summary.sh Perplexica validation instead of /dev/null so failures are diagnosable Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ainers On fresh Ubuntu installs with Strix Halo, /dev/kfd may not exist if the amdkfd kernel module isn't loaded. The installer detected the GPU via sysfs and configured GTT memory, but never verified the compute devices existed. Containers then fail with a cryptic Docker error: "error gathering device information while adding custom device" Fix: after group setup, attempt modprobe amdkfd if /dev/kfd is missing. Also verify /dev/dri and renderD128 exist. Clear warnings tell the user what to do instead of a silent Docker failure. Found during real Strix Halo user install on Ubuntu 24.04 Desktop. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Docker 29.3.0 fails with "error gathering device information while adding custom device /dev/dri: no such file or directory" on AMD GPUs, even when /dev/dri and /dev/kfd both exist. This blocks llama-server and comfyui from starting on Strix Halo. Confirmed: Docker 29.2.1 on the same kernel (6.17.0-19) and same hardware works perfectly. Docker 29.3.0 does not. Fix: after Docker install/detection, check version. If 29.3.x and AMD backend, automatically downgrade to 29.2.1 with clear messaging. Supports apt (Ubuntu/Debian) and dnf (Fedora/RHEL). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: v2.2 UX — dream CLI, repair command, AMD GPU fixes, Docker 29.3 pin
Bump version in manifest.json, constants.sh, README install URL, and get-dream-server.sh bootstrap comment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Thin wrapper around install.sh that forces --tier 0 (Qwen 3.5 2B, ~1.5GB download) and --non-interactive mode. All other installer behavior is identical. Pass additional flags as needed: ./test-install.sh # Minimal test install ./test-install.sh --all # All services, tiny model ./test-install.sh --dry-run # Preview without changes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The 600s (10 min) timeout was too aggressive for large image pulls like the ~10GB CUDA llama-server image on slower connections. Bumps to 3600s (60 min) to prevent false timeouts during legitimate downloads. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the installer is run via `curl ... | bash`, stdin is the piped script content, so `read` gets EOF and `set -e` kills the process. All 16 interactive read commands now explicitly read from /dev/tty, which is the user's terminal regardless of how stdin is wired. Affected files: - installers/lib/ui.sh (install menu) - installers/lib/detection.sh (reboot prompt) - installers/phases/02-detection.sh (reboot prompt) - installers/phases/03-features.sh (feature toggles) - installers/phases/04-requirements.sh (ollama + continue prompts) - installers/phases/05-docker.sh (sudo docker prompt) - installers/macos/install-macos.sh (all interactive prompts) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Services exit immediately on success, so longer timeouts only affect failure cases. Previous limits (1-5 min) were too aggressive for slow hardware, large model loads (FLUX, whisper-large), and first-boot scenarios where models download on startup. All services now get 150 attempts with adaptive backoff (2s→8s cap), giving ~20 minutes before the installer gives up. Zero cost on fast machines — the check returns instantly once the service responds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bumps wget --timeout from 300s/600s to 3600s for GGUF and FLUX model downloads. This is the network stall timeout (no data received), not a total time cap. Prevents false failures on slow or intermittent connections without affecting fast downloads. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- STT model download (whisper-large ~1.5GB): 120s → 3600s - Offline embedding model download: 600s → 3600s - Background task wait (bootstrap model): 300s → 1200s (20 min) Previous limits assumed fast connections. These are all no-cost on fast hardware — they only prevent premature failures on slow links. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
curl is consistently faster than wget for HuggingFace downloads due to better HTTP/2 support and connection reuse. Also eliminates wget as an installer dependency — curl is already required everywhere. Flags: -fSL (fail on error, silent, follow redirects), -C - (resume partial downloads), --connect-timeout 30, --max-time 3600. Updates test-network-timeouts.sh assertions to match (wget -> curl, --timeout -> --max-time). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
For Tier 1+ installs, the installer now downloads a tiny 2B bootstrap model (~1.5GB, ~1 min) first and starts services immediately. The full tier-appropriate model downloads in the background and auto hot-swaps via bootstrap-upgrade.sh when ready (~30s interruption). This eliminates 80%+ of install wait time. Users can start chatting within 2-3 minutes instead of waiting 10-30 min for large model downloads. New files: - installers/lib/bootstrap-model.sh: constants + bootstrap_needed() - scripts/bootstrap-upgrade.sh: background download + auto hot-swap Modified files: - installers/phases/11-services.sh: bootstrap flow before compose up - install-core.sh: --no-bootstrap flag - installers/phases/13-summary.sh: bootstrap status in summary Behavior: - Tier 0: no change (full model IS the bootstrap model) - Tier 1+: bootstrap → background download → auto-swap - --no-bootstrap: opt out, download full model in foreground - --offline/--cloud: bootstrap skipped automatically - Re-install with model on disk: bootstrap skipped Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update all version references for the v2.3.0 release: - constants.sh, 06-directories.sh fallback - get-dream-server.sh curl URL - Both READMEs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The .env file is generated in phase 06 with the full model's GGUF_FILE and LLM_MODEL values. When bootstrap mode is active, phase 11 swaps these variables for the download and models.ini, but docker compose reads from .env — so llama-server tried to load a model file that doesn't exist yet (the full model is still downloading in background). Now phase 11 patches GGUF_FILE, LLM_MODEL, and MAX_CONTEXT in .env to match the bootstrap model before running compose up. The background upgrade script (bootstrap-upgrade.sh) already updates .env back to the full model values when the download completes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The 6 sequential docker compose builds ran silently with output suppressed to the log file. The terminal went dead for several minutes, making it look like the installer had exited. Each build now runs in background with spin_task, showing: [1/6] Building dashboard [2/6] Building dashboard-api ... with ✓/⚠ status on completion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes:
1. check_service() used if/then around curl, which consumed the exit
code — $? was always 0 after the if block, so the timeout (124)
vs connection-refused (7) distinction never worked. Switched to
cmd && { success } pattern so $? reflects the actual curl exit.
2. Container build loop now shows [1/6] Building <service> spinner
instead of going silent for several minutes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ght-Heart-Labs#511) The nohup and bg_task_start calls had trailing spaces but no backslashes, so bash treated each line as a separate command. The nohup ran bootstrap-upgrade.sh with no arguments, then "$INSTALL_DIR" was executed as a command, crashing the installer under set -euo pipefail. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Default MODELS_DIR was $DREAM_DIR/models but the installer stores models in $DREAM_DIR/data/models. Script was completely non-functional as a standalone tool. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-Labs#512) Release b5570 no longer has the Vulkan Windows asset on GitHub — downloads return a 9-byte 404 body saved as a "zip", causing "Central Directory corrupt" on extraction. Updates to b8248 which matches the Linux Docker image tag and has a confirmed Vulkan binary. Also syncs DS_VERSION to 2.3.0. Fixes #209 Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-Heart-Labs#513) ComfyUI and its FLUX models were installed unconditionally — even "Custom" installs had no way to skip image generation. This adds ENABLE_COMFYUI flag following the same pattern as ENABLE_VOICE, ENABLE_WORKFLOWS, ENABLE_RAG, and ENABLE_OPENCLAW. When disabled: - Skips 34GB FLUX model download - Skips ComfyUI Docker image pull + build - Skips ComfyUI health check - Sets ENABLE_IMAGE_GENERATION=false in .env so Open WebUI hides the image generation button entirely Also fixes "Core Only" menu option which previously didn't disable any optional services (all ENABLE flags defaulted to true). New CLI flag: --comfyui (included in --all) Custom install prompt: "Enable image generation (ComfyUI + FLUX, ~34GB)? [Y/n]" Fixes #196 Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…bs#514) pro.json was binding to 0.0.0.0 (all interfaces) with dangerouslyDisableDeviceAuth enabled, exposing the OpenClaw gateway to the network without authentication. Changes host from 0.0.0.0 to 127.0.0.1, matching the other two configs (openclaw.json, openclaw-strix-halo.json). With localhost-only binding, the disabled device auth is safe — only local processes (Docker containers via internal network) can reach the gateway. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…art-Labs#516) The optional ComfyUI feature (Light-Heart-Labs#513) added this env var to the .env generator but not to the schema, causing schema validation to fail during install. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ight-Heart-Labs#517) Docker Compose v5+ errors when an overlay references a service not defined in any included compose file. The tier0 overlay referenced qdrant, n8n, whisper, tts, openclaw, embeddings, etc. — all optional services that may not be in the stack. The old comment "Docker Compose ignores overrides for services not defined in base" was true for Compose v2 but is false for v5. Now only overrides the 4 base services: llama-server, dashboard, dashboard-api, open-webui. Optional services use their own compose files' default limits. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
) * fix: resolve .env port overrides for health checks SERVICE_PORTS reads from manifest defaults (e.g. 8080 for llama-server) but .env may override them (e.g. OLLAMA_PORT=11434 on Strix Halo). Health checks were hitting the wrong port, timing out for 20 minutes, then reporting failure even though the service was running fine. Phase 12 now reads port vars from .env after sr_load and updates SERVICE_PORTS via SERVICE_PORT_ENVS (indirect variable expansion). Also fixes bootstrap-upgrade.sh which runs via nohup and doesn't inherit env vars from the parent shell. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: also resolve .env port overrides in phase 13 summary Phase 13 uses SERVICE_PORTS for Perplexica auto-config URLs and the final "YOUR DREAM SERVER IS LIVE" display. Without resolving .env overrides, users see wrong URLs (e.g. localhost:3000 when WebUI is actually on a different port). Same resolution pattern as phase 12. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: centralize port resolution in sr_resolve_ports() Extracts the SERVICE_PORTS override logic into a shared function in service-registry.sh instead of inline code in each consumer. All 9 post-install scripts and both installer phases now call sr_resolve_ports() after loading .env, ensuring SERVICE_PORTS reflects actual port config everywhere (not just manifest defaults). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Light-Heart-Labs#519) * fix: dashboard health checks on Docker Desktop (Windows/WSL2) Root cause: Docker Desktop's embedded DNS takes ~4 seconds to return NXDOMAIN for non-running containers. With 19 services checked concurrently via asyncio.gather, the slow DNS lookups blocked running services from being checked in time, causing everything to show as "degraded" on the dashboard. Fix (three-part): 1. Fresh session per poll cycle — eliminates stale connection pool issues. The global aiohttp session accumulated dead connections from non-running services, poisoning subsequent polls. Now each cycle creates a fresh session with force_close=True and use_dns_cache=False, then closes it. 2. Not-deployed cache with TTL — services that fail DNS get cached for 15 seconds. Subsequent polls skip them entirely, so the slow 4-second DNS lookups only happen once per service. 3. Two-phase polling — Phase 1 returns cached not_deployed results instantly. Phase 2 checks remaining services with a semaphore (limit=4) to prevent DNS contention. Total timeout raised to 30s so the first poll (which has no cache) can complete even with slow DNS. Net effect: first poll takes ~4-5 seconds (DNS for non-deployed services), subsequent polls complete in <50ms. All running services show healthy with 1-5ms response times. No behavior change on native Linux Docker where DNS failures are instant. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: background polling for dashboard health checks Replaces request-triggered health checks with a background polling loop. API endpoints return cached results instantly (<1ms) instead of running live checks on every request (8-16s on Docker Desktop). Architecture: - Background task polls get_all_services() every 10 seconds - Results stored in module-level cache - All endpoints read from cache, falling back to live check only on first request before the poll completes helpers.py changes (reverted from previous PR, minimal diff): - Restored original shared aiohttp session pattern - Increased total timeout from 5s to 30s (no user impact since it only runs in the background poll) - Added asyncio.TimeoutError handling in _check_host_service_health (bug fix: was raising unhandled NameError) - Added get_cached_services() / set_services_cache() for the background poll to write and endpoints to read main.py changes: - Added _poll_service_health() background task (started on app startup) - Added _get_services() async helper for cache-or-live fallback - Updated /services, /status, _build_api_status() to read from cache routers/features.py: - Updated /api/features to read cached services instead of live check Tested on: - Windows Docker Desktop (RTX 5090): 11 healthy, 0 degraded, <350ms - Linux native Docker (Strix Halo): 18/18 healthy (no regression) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tall (Light-Heart-Labs#520) * feat(windows): add bootstrap fast-start for instant chat during install Ports the Linux bootstrap pattern to the Windows installer. For Tier 1+ installs, downloads a tiny 2B model (~1.5GB, ~1 min) first so users can chat immediately. The full tier model downloads in the background via bootstrap-upgrade.sh (which already works on Windows via Git Bash) and auto-swaps when ready. Changes: - tier-map.ps1: Add bootstrap constants, Get-TierRank, Should-UseBootstrap - install-windows.ps1 phase 8: Bootstrap check, variable swap, .env patch, background upgrade launch via Start-Process Before: Tier 3 install blocked for 30+ min downloading 18GB model. After: Chat available in ~2 min. Full model downloads invisibly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(macos): add bootstrap fast-start for instant chat during install Same pattern as Windows PR — ports Linux bootstrap to macOS installer. Tier 1+ installs download the tiny 2B model first, then the full model downloads in the background via nohup + bootstrap-upgrade.sh. Uses macOS sed -i '' syntax for .env patching (BSD sed, not GNU). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Labs#521) Start-Process with -ArgumentList "-c", $bashArgs passes them as separate arguments. bash -c needs the command as a single string. Changed to single-quoted args inside the -c string so bash receives all 6 arguments correctly. Tested: bootstrap-upgrade.sh starts and receives install_dir, gguf_file, gguf_url, sha256, llm_model, max_context — begins downloading. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ng (Light-Heart-Labs#522) PowerShell Start-Process cannot reliably pass empty arguments (like SHA256 for NV_ULTRA/SH_LARGE tiers) through the Windows command line to bash. Empty strings get collapsed during command-line parsing, shifting all subsequent arguments. Fix: write a temp wrapper script (logs/bootstrap-run.sh) with the arguments embedded as bash double-quoted strings. Empty arguments become "" which bash preserves correctly. No command-line quoting involved — Start-Process just runs the wrapper script. Tested with empty SHA256: all 6 arguments arrive in the correct positions. Script starts downloading successfully. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update all version references across Linux, Windows, and macOS installers. Also syncs macOS DS_VERSION from 2.0.0-strix-halo. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…y for Arch/Void/Alpine (Light-Heart-Labs#551) The installer unconditionally used get.docker.com for Docker installation, which only supports Debian/Ubuntu/Fedora/RHEL/SLES. This broke installation on Arch Linux and other distros using pacman, xbps, or apk. Similarly, the service registry's Python manifest parser requires PyYAML, which is pre-installed on Debian/Fedora but not on Arch/Void/Alpine. Docker install (fixes Light-Heart-Labs#546): - Add case dispatch on $PKG_MANAGER in 05-docker.sh - apt/dnf/zypper: unchanged get.docker.com path (zero regression) - pacman: pkg_install docker + systemctl enable - xbps: pkg_install docker + runit service link - apk: pkg_install docker + OpenRC enable/start - Unknown: get.docker.com fallback with improved error mentioning --skip-docker PyYAML dependency (fixes Light-Heart-Labs#545): - Add python3-pyyaml canonical name to pkg_resolve() for all 6 package managers - sr_load() now checks for `import yaml` before running the manifest parser - Auto-installs the distro-appropriate PyYAML package when packaging functions are available (installer context) with declare -f guards for safety in dream-cli context - Fix silent heredoc failure: wrap Python heredoc in `if !` to capture exit code instead of silently continuing with empty SERVICE_* arrays - Use _SR_FAILED flag for failure signaling (return 0 always) so dream-cli doesn't crash under set -e while installer can detect and retry Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ht-Heart-Labs#552) * fix(installer): Windows port 8080 conflict + ComfyUI tier-aware gating Two issues reported by beta tester on Windows 11 + WSL2 (7GB RAM, NVIDIA 940MX 2GB VRAM, Tier 0): 1. Port 8080 conflict: Windows env generator hardcoded OLLAMA_PORT=8080, but wslrelay occupies port 8080 on every WSL2 system. Changed default to 11434, matching the Linux default in .env.example. The Docker internal port stays 8080 — only the host-facing port changes. 2. ComfyUI crashes low-RAM systems: Full Stack silently enabled ComfyUI, which requests shm_size 8GB + memory limit 24GB. On a 7GB system, Docker can't allocate shared memory, causing a network bridge failure that kills the entire compose-up. Added tier-aware auto-disable for Tier 0 and Tier 1 in Full Stack mode, with a warning and re-prompt in Custom mode. Changes: - env-generator.ps1: OLLAMA_PORT 8080 → 11434 - ui.sh: Full Stack auto-disables ComfyUI on Tier 0/1 with user message - install-core.sh: add --no-comfyui / --comfyui flags + usage docs - 03-features.sh: Custom mode warns Tier 0/1 users and flips default to N Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(windows): add ComfyUI tier-aware gating to Windows installer PR Light-Heart-Labs#552 added ComfyUI tier gating for the Linux bash installer but the Windows PowerShell installer has its own parallel code paths that were untouched. ComfyUI was always included in the compose stack on Windows because the service skip switch had no "comfyui" case. This was the actual root cause of the beta tester's fatal crash on Windows 11 + WSL2 (7GB RAM, Tier 0) — ComfyUI's shm_size: 8g exceeded available memory, crashing Docker's network bridge creation. Changes: - install.ps1: add -Comfyui and -NoComfyui switch parameters - install-windows.ps1: add params, context vars, and "comfyui" case to the service skip switch (the critical fix) - 03-features.ps1: add $enableComfyui variable, Full Stack auto-disables on Tier 0/1 with user message, Custom mode adds ComfyUI prompt with tier warning, Core Only disables ComfyUI Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: strip spurious UTF-8 BOMs from PowerShell files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(installer): add ComfyUI tier safety net for non-interactive mode The tier-aware ComfyUI gating only ran inside the interactive menu block. Non-interactive installs (--non-interactive / -NonInteractive) skipped the menu entirely, leaving ENABLE_COMFYUI=true on Tier 0/1 systems where ComfyUI's shm_size 8GB exceeds available RAM. Add a safety net after the interactive block on both Linux and Windows that unconditionally disables ComfyUI on Tier 0/1. In interactive mode this is a no-op (menu already handled it). In non-interactive mode this prevents the crash. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(installer): scope ComfyUI safety net to non-interactive mode only The safety net unconditionally overrode ENABLE_COMFYUI on Tier 0/1, which would silently undo an explicit user confirmation in Custom mode (user says Y to ComfyUI, Y to the tier warning, then safety net disables it anyway). Guard with ! $INTERACTIVE (Linux) / $nonInteractive (Windows) so it only fires in headless mode where the user was never prompted. Interactive mode already has its own tier checks in the menu. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(windows): remove incorrect nonInteractive guard from Custom mode warning The previous commit accidentally applied the $nonInteractive guard to the Custom mode tier warning prompt inside the interactive menu block. Since the menu block itself is gated by -not $nonInteractive, the condition was always false and the warning never fired. The guard should only be on the safety net after the menu block. The Custom mode warning is interactive by definition. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…art-Labs#553) On NVIDIA systems, embeddings (TEI) and Open WebUI report unhealthy during first install because model download and startup exceed the original grace periods. Bump start_period and reduce excessive installer polling to eliminate false-positive health failures. - embeddings: start_period 60s→120s, retries 3→5 (first-run model download) - Open WebUI: start_period 30s→60s (depends on llama-server VRAM loading) - Phase 12: Open WebUI max_attempts 60→45 (90s budget vs ~70s needed) Tested on NVIDIA (192.168.0.143) and AMD (192.168.0.213). Config-only changes — fast systems unaffected (healthy on first /health 200). Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Includes fixes since v2.3.2: - Light-Heart-Labs#551: Arch/Void/Alpine distro-native Docker install + PyYAML dependency - Light-Heart-Labs#552: Windows port 8080 conflict + ComfyUI tier gating - Light-Heart-Labs#553: Healthcheck timing tuning for first-run startups Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Mar 22, 2026
…rt check (Light-Heart-Labs#562) On systems where Docker cannot allocate NVIDIA GPU devices (exit 125 from nvidia-smi smoke test), the installer now falls back to the existing docker-compose.cpu.yml overlay instead of crashing at docker compose up with exit code 1. Changes: - Phase 05: track GPU passthrough result in $script:gpuPassthroughFailed - Phase 08: use docker-compose.cpu.yml when flag is set, skip extension NVIDIA overlays (whisper, comfyui) that also require GPU reservation - Phase 04: fix port conflict check from 8080 → 11434 to match OLLAMA_PORT default (eliminates false wslrelay warning on WSL2) Reported by tester on GTX 1050 Ti / Windows 11 / WSL2 where GPU passthrough consistently fails. Systems with working GPU passthrough are completely unaffected. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lightheartdevs
previously approved these changes
Mar 22, 2026
Collaborator
Lightheartdevs
left a comment
There was a problem hiding this comment.
Reviewed: changes scoped to resources/dev/extensions-library only, no installer interaction. LGTM.
Collaborator
|
Approved but has merge conflicts from other PRs that just landed. Please rebase against main and we'll merge. Thanks for the solid work on these! 🙏 |
…ight-Heart-Labs#571) Docker Compose requires an image on every service definition even when replicas is 0. The Windows AMD overlay disables llama-server (runs natively via Vulkan) but the base compose stub has no image, causing compose validation to fail with "has neither an image nor a build context specified." Add hello-world:latest as a placeholder — the container never runs. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…bs#572) * feat: upgrade tier models to Qwen 3.5 and GPT-OSS-20B Upgrade default models across all tiers and platforms: - Tier 1 (Entry) + Tier 2 (Prosumer): qwen3-8b → qwen3.5-9b (5.68GB) - Tier 3 (Pro): qwen3-14b → gpt-oss-20b (11.6GB) - Intel ARC: qwen3-8b → qwen3.5-9b - Intel ARC_LITE: qwen3-4b → qwen3.5-4b - macOS Tier 1: qwen3-4b → qwen3.5-9b (was overly conservative) Updated across all three platforms (Linux, Windows, macOS): - Tier map configs (resolve + tier_to_model functions) - Compose file GGUF defaults (base, cpu, arc, intel) - CLI fallback defaults (dream.ps1, dream-macos.sh) - Agent templates (5 templates + README) - Repair scripts and installer summary phase - Disk size estimation patterns - All test assertions All download URLs verified (HTTP 200). SHA256 hashes sourced from Hugging Face. No stale model references remain. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update model references in README, SECURITY, and docs to match tier upgrade Follow-up to tier model upgrade — update documentation references: - README.md: Apple Silicon tier table (qwen3-4b/8b → qwen3.5-9b, gpt-oss-20b) - SECURITY.md: model verification examples (Qwen3-8B → Qwen3.5-9B) - KNOWN-GOOD-VERSIONS.md: macOS known-good model (Qwen3-4B → Qwen3.5-9B) - llama-server README: GGUF_FILE default (Qwen3-8B → Qwen3.5-9B) - .claude/commands/tdd.md: example tier config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: swap Tier 3 model from GPT-OSS-20B to Qwen3.5-27B GPT-OSS-20B uses special tokens (<|start|>, <|channel|>, <|constrain|>) for structured output that are incompatible with llama.cpp's JSON grammar mode. This causes Perplexica (which uses generateObject) to fail with HTTP 500 on every query. Pure chat inference worked fine but structured output / tool calling was broken. Qwen3.5-27B (16.7GB Q4_K_M) is the same model family as Tier 1-2 (Qwen 3.5), proven compatible with llama.cpp structured output, and fits in 20-39GB VRAM tier. Updated across all platforms, tests, agent templates, docs, and disk estimation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: correct GGUF_FILE case in Tier 3 test assertion Qwen3.5-27B-Q4_K_M.gguf not qwen3.5-27b-Q4_K_M.gguf — sed lowercased it during the bulk replace. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Heart-Labs#574) Follow-up to Light-Heart-Labs#573 — docs still referenced old Qwen3 8B/4B/14B models. Updated to match current tier map: - T1/T2/ARC: Qwen3.5 9B - T3: Qwen3.5 27B - ARC_LITE: Qwen3.5 4B Files: root README, FAQ, INTEL-ARC-GUIDE, MACOS-QUICKSTART, SUPPORT-MATRIX Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…depends_on Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2512ed6 to
5b62a43
Compare
The base branch was changed.
Contributor
Author
|
Closing: the depends_on healthy condition already exists on the resources/dev branch (sidecars renamed to paperless-postgres/paperless-redis with condition: service_healthy). This PR is redundant. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Change
depends_onfrom array form to map form withcondition: service_healthyfor both postgres and redis.Why
Array-form
depends_ononly waits for containers to start, not for them to be healthy. Paperless-ngx can crash-loop on slow hardware when postgres hasn't finished initializing.How
Both postgres and redis already have proper healthchecks defined in the same compose file.
Scope
All changes within
resources/dev/extensions-library/services/paperless-ngx/compose.yaml.Merging Order
Merge after PR #526 (paperless secret key) — same file, different lines, no textual conflict.
Testing
Review
Critique Guardian verdict: APPROVED — correct pattern, healthchecks verified, low regression risk.
Originally reported in yasinBursali#88