Light-Heart-Labs · yasinBursali · Apr 16, 2026 · Apr 19, 2026 · Apr 23, 2026
diff --git a/README.md b/README.md
@@ -185,7 +185,7 @@ The installer detects your GPU and picks the optimal model automatically. No man
 | Unified RAM | Model | Example Hardware |
 |-------------|-------|-----------------|
 | < 16 GB | Qwen3.5 2B (Q4_K_M) | M1/M2 base (8GB) |
-| 16–24 GB | Qwen3.5 4B (Q4_K_M) | M4 Mac Mini (16GB) |
+| 16–24 GB | Qwen3.5 9B (Q4_K_M) | M4 Mac Mini (16GB) |
 | 32 GB | Qwen3.5 9B (Q4_K_M) | M4 Pro Mac Mini, M3 Max MacBook Pro |
 | 48 GB | Qwen3 30B-A3B (MoE, Q4_K_M) | M4 Pro (48GB), M2 Max (48GB) |
 | 64+ GB | Qwen3 30B-A3B (MoE, Q4_K_M) | M2 Ultra Mac Studio, M4 Max (64GB+) |

diff --git a/dream-server/.env.example b/dream-server/.env.example
@@ -261,6 +261,9 @@ LANGFUSE_INIT_USER_PASSWORD=   # auto-generated during install
 # llama-server memory limit (Docker)
 # LLAMA_SERVER_MEMORY_LIMIT=64G
 
+# llama-server CPU core limit (macOS/CPU-only mode — static default 8.0)
+# Tune this to control how many CPU cores llama-server may use.
+# LLAMA_CPU_LIMIT=8.0
 #=== DreamForge (Local Agentic Coding) ===
 # DREAMFORGE_IMAGE=ghcr.io/light-heart-labs/dreamforge:v0.1.0
 # DREAMFORGE_PORT=3010

diff --git a/dream-server/QUICKSTART.md b/dream-server/QUICKSTART.md
@@ -33,10 +33,10 @@ The installer will:
      - SH_LARGE (90GB+): qwen3-coder-next (80B MoE), 128K context
      - SH_COMPACT (64-89GB): qwen3-30b-a3b (30B MoE), 128K context
    - **NVIDIA (discrete GPU)**:
-     - Tier 1 (Entry): <12GB VRAM → qwen2.5-7b-instruct (GGUF Q4_K_M), 16K context
-     - Tier 2 (Prosumer): 12-20GB VRAM → qwen2.5-14b-instruct (GGUF Q4_K_M), 16K context
-     - Tier 3 (Pro): 20-40GB VRAM → qwen2.5-32b-instruct (GGUF Q4_K_M), 32K context
-     - Tier 4 (Enterprise): 40GB+ VRAM → qwen2.5-72b-instruct (GGUF Q4_K_M), 32K context
+     - Tier 1 (Entry): <12GB VRAM → qwen3.5-9b (GGUF Q4_K_M), 16K context
+     - Tier 2 (Prosumer): 12-20GB VRAM → qwen3.5-9b (GGUF Q4_K_M), 32K context
+     - Tier 3 (Pro): 20-40GB VRAM → qwen3-30b-a3b (GGUF Q4_K_M), 32K context
+     - Tier 4 (Enterprise): 40GB+ VRAM → qwen3-30b-a3b (GGUF Q4_K_M), 128K context
 2. Check Docker and GPU toolkit (NVIDIA Container Toolkit or ROCm devices)
 3. Ask which optional components to enable (voice, workflows, RAG)
 4. Generate secure passwords and configuration
@@ -100,7 +100,7 @@ Visit: **http://localhost:3000**
 curl http://localhost:8080/v1/chat/completions \
   -H "Content-Type: application/json" \
   -d '{
-    "model": "qwen2.5-32b-instruct",
+    "model": "qwen3-30b-a3b",
     "messages": [{"role": "user", "content": "Hello!"}]
   }'
 ```
@@ -132,10 +132,10 @@ The installer auto-detects your GPU and selects the optimal configuration:
 
 | Tier | VRAM | Model | Example GPUs |
 |------|------|-------|--------------|
-| 1 (Entry) | <12GB | Qwen2.5-7B | RTX 3080, RTX 4070 |
-| 2 (Prosumer) | 12-20GB | Qwen2.5-14B (GGUF Q4_K_M) | RTX 3090, RTX 4080 |
-| 3 (Pro) | 20-40GB | Qwen2.5-32B (GGUF Q4_K_M) | RTX 4090, A6000 |
-| 4 (Enterprise) | 40GB+ | Qwen2.5-72B (GGUF Q4_K_M) | A100, H100 |
+| 1 (Entry) | <12GB | qwen3.5-9b (GGUF Q4_K_M) | RTX 3080, RTX 4070 |
+| 2 (Prosumer) | 12-20GB | qwen3.5-9b (GGUF Q4_K_M) | RTX 3090, RTX 4080 |
+| 3 (Pro) | 20-40GB | qwen3-30b-a3b (GGUF Q4_K_M) | RTX 4090, A6000 |
+| 4 (Enterprise) | 40GB+ | qwen3-30b-a3b (GGUF Q4_K_M) | A100, H100 |
 
 To check what tier you'd get without installing:
 
@@ -156,7 +156,7 @@ CTX_SIZE=4096  # or even 2048
 
 Or switch to a smaller model:
 ```
-LLM_MODEL=qwen2.5-7b-instruct
+LLM_MODEL=qwen3.5-9b
 ```
 
 ### AMD: llama-server crash loop

diff --git a/dream-server/README.md b/dream-server/README.md
@@ -130,10 +130,10 @@ Both tiers use `qwen2.5:7b` as a bootstrap model for instant startup. The full m
 | Tier | VRAM | Model | Quant | Context | Example GPUs |
 |------|------|-------|-------|---------|--------------|
 | NV_ULTRA | 90GB+ | qwen3-coder-next | GGUF Q4_K_M | 128K | Multi-GPU A100/H100 |
-| 1 (Entry) | <12GB | qwen2.5-7b-instruct | GGUF Q4_K_M | 16K | RTX 3080, RTX 4070 |
-| 2 (Prosumer) | 12-20GB | qwen2.5-14b-instruct | GGUF Q4_K_M | 16K | RTX 3090, RTX 4080 |
-| 3 (Pro) | 20-40GB | qwen2.5-32b-instruct | GGUF Q4_K_M | 32K | RTX 4090, A6000 |
-| 4 (Enterprise) | 40GB+ | qwen2.5-72b-instruct | GGUF Q4_K_M | 32K | A100, H100, multi-GPU |
+| 1 (Entry) | <12GB | qwen3.5-9b | GGUF Q4_K_M | 16K | RTX 3080, RTX 4070 |
+| 2 (Prosumer) | 12-20GB | qwen3.5-9b | GGUF Q4_K_M | 32K | RTX 3090, RTX 4080 |
+| 3 (Pro) | 20-40GB | qwen3-30b-a3b | GGUF Q4_K_M | 32K | RTX 4090, A6000 |
+| 4 (Enterprise) | 40GB+ | qwen3-30b-a3b | GGUF Q4_K_M | 128K | A100, H100, multi-GPU |
 
 ### Apple Silicon (Unified Memory, Metal)
 
@@ -142,7 +142,7 @@ Both tiers use `qwen2.5:7b` as a bootstrap model for instant startup. The full m
 | 1 (Entry) | 8–24GB | qwen3.5-9b | GGUF Q4_K_M | 16K | M1/M2 base, M4 Mac Mini (16GB) |
 | 2 (Prosumer) | 32GB | qwen3.5-9b | GGUF Q4_K_M | 32K | M4 Pro Mac Mini, M3 Max MacBook Pro |
 | 3 (Pro) | 48GB | qwen3-30b-a3b | GGUF Q4_K_M | 32K | M4 Pro (48GB), M2 Max (48GB) |
-| 4 (Enterprise) | 64GB+ | qwen3-30b-a3b (30B MoE) | GGUF Q4_K_M | 131K | M2 Ultra Mac Studio, M4 Max (64GB+) |
+| 4 (Enterprise) | 64GB+ | qwen3-30b-a3b (30B MoE) | GGUF Q4_K_M | 128K | M2 Ultra Mac Studio, M4 Max (64GB+) |
 
 Override with: `./install.sh --tier 3`
 
@@ -188,7 +188,7 @@ See [docs/HARDWARE-GUIDE.md](docs/HARDWARE-GUIDE.md) for buying recommendations.
 ┌─────────────────────▼───────────────────────────┐
 │               llama-server (CUDA)               │
 │            (localhost:8080/v1/...)               │
-│            qwen2.5-32b-instruct                 │
+│            qwen3-30b-a3b                        │
 └─────────────────────────────────────────────────┘
          │                              │
 ┌────────▼────────┐            ┌───────▼────────┐
@@ -244,7 +244,7 @@ The installer generates `.env` automatically. Key settings:
 
 ```bash
 # NVIDIA
-LLM_MODEL=qwen2.5-32b-instruct            # Model (auto-set by installer)
+LLM_MODEL=qwen3-30b-a3b                   # Model (auto-set by installer)
 CTX_SIZE=32768                             # Context window
 
 # AMD Strix Halo

diff --git a/dream-server/SECURITY.md b/dream-server/SECURITY.md
@@ -77,6 +77,30 @@ sudo ufw allow from 192.168.0.0/24 to any port 3001  # Dashboard
 sudo ufw allow from 192.168.0.0/24 to any port 8080  # LLM API
 ```
 
+### Host Agent Network Binding
+
+The host agent (`bin/dream-host-agent.py`) has its own bind address, separate from the Docker services above. It is controlled by `DREAM_AGENT_BIND` in `.env`:
+
+| Platform | Default | Behavior |
+|----------|---------|----------|
+| macOS / Windows | `127.0.0.1` | Docker Desktop routes container traffic via loopback — loopback is sufficient |
+| Linux | auto-detected | Detects the Docker bridge gateway IP (e.g. `172.17.0.1`) so containers can reach the agent; LAN devices cannot. Falls back to `127.0.0.1` if detection fails. |
+
+To override the default, set `DREAM_AGENT_BIND` in `.env`:
+
+```bash
+# Restrict to loopback only (e.g. no-Docker Linux or extra hardening)
+DREAM_AGENT_BIND=127.0.0.1
+
+# Bind to Docker bridge only (explicit Linux default)
+DREAM_AGENT_BIND=172.17.0.1
+
+# Bind to all interfaces — exposes the host agent API on LAN (not recommended)
+DREAM_AGENT_BIND=0.0.0.0
+```
+
+> **Note:** If you bind to `0.0.0.0`, ensure `DREAM_AGENT_KEY` is set in `.env` — it protects the extension management endpoints with Bearer token authentication.
+
 ### Exposing to Internet (Not Recommended)
 
 If you must expose publicly, use a reverse proxy with TLS:

diff --git a/dream-server/docs/FAQ.md b/dream-server/docs/FAQ.md
@@ -195,10 +195,102 @@ Options:
 ### How do I get updates?
 
 ```bash
-./dream-cli update
+dream update
 ```
 
-That's it. Updates are optional — you control when to apply them.
+Updates are optional — you control when to apply them.
+
+**Preview changes without applying:**
+```bash
+dream update --dry-run
+```
+
+**Skip version-compatibility confirmation:**
+```bash
+dream update --force
+```
+
+`dream update` automatically creates a pre-update snapshot before pulling new images, then verifies all services are healthy afterward. If something goes wrong, run:
+
+```bash
+dream rollback
+```
+
+This restores configuration from the pre-update snapshot and restarts services.
+
+---
+
+### How do I back up and restore my data?
+
+**Create a backup** (saves user data and config to `.backups/`):
+```bash
+dream backup
+```
+
+**Create a compressed backup:**
+```bash
+dream backup -c
+```
+
+**List existing backups:**
+```bash
+dream backup -l
+```
+
+**Verify a backup's integrity:**
+```bash
+dream backup verify <backup_id>
+```
+
+**Restore from a backup** (interactive — lets you choose from available backups):
+```bash
+dream restore
+```
+
+**Restore a specific backup by ID:**
+```bash
+dream restore <backup_id>
+```
+
+**Rollback after a failed update** (restores the pre-update snapshot):
+```bash
+dream rollback
+```
+
+`dream update` always creates a pre-update snapshot, so `dream rollback` is available immediately after any update attempt.
+
+---
+
+### What are service templates?
+
+Templates are curated presets that enable a group of extensions suited to a specific use case — for example, a creative-studio setup (image generation + voice) or a research workflow (RAG + web search + agents).
+
+**List available templates:**
+```bash
+dream template list
+```
+
+**Preview what a template will change before applying:**
+```bash
+dream template preview <template-id>
+```
+
+**Apply a template (enables the template's services):**
+```bash
+dream template apply <template-id>
+```
+
+Applying a template only enables services — it doesn't disable anything you've already set up.
+
+---
+
+### Can I chat while models are downloading?
+
+Yes. During install, a small bootstrap model (~1.5GB, Qwen 3.5 2B) downloads first so you can start chatting within a couple of minutes. The full tier-appropriate model downloads in the background.
+
+When the full model finishes, the system swaps it in automatically — you don't need to do anything. `dream status` shows the current bootstrap state if a swap is still in progress.
+
+---
 
 ### Where do I get help?
 

diff --git a/dream-server/docs/HOST-AGENT-API.md b/dream-server/docs/HOST-AGENT-API.md
@@ -12,6 +12,7 @@ The Dashboard API runs inside a Docker container and cannot directly run `docker
 |----------|-----------|
 | Linux | systemd user service (`scripts/systemd/dream-host-agent.service`) |
 | macOS | Started by the installer (`installers/macos/install-macos.sh`) |
+| Windows | Started by the installer (`installers/windows/phases/07-devtools.ps1`, managed via `dream.ps1`) |
 
 The agent is started during installation (phase 07 on Linux) and binds to `127.0.0.1` only — it is not accessible from the network.
 

diff --git a/dream-server/docs/MODE-SWITCH.md b/dream-server/docs/MODE-SWITCH.md
@@ -27,7 +27,7 @@ dream restart
 
 ## How It Works
 
-One env var (`LLM_API_URL`) controls where all services send LLM requests. Three modes set this automatically:
+One env var (`LLM_API_URL`) controls where all services send LLM requests. Three modes are user-selectable via `dream mode`; a fourth (`lemonade`) is auto-configured by the installer on AMD hardware — see [Lemonade Mode](#lemonade-mode-amd--auto-configured) below.
 
 | Mode | `LLM_API_URL` | `DREAM_MODE` | LiteLLM config |
 |------|---------------|--------------|-----------------|
@@ -88,13 +88,28 @@ Local llama-server as primary, cloud APIs as fallback via LiteLLM.
 dream mode hybrid
 ```
 
+### Lemonade Mode (AMD — auto-configured)
+
+**Not user-switchable.** This mode is automatically set by the installer on AMD hardware. `dream mode` does not accept `lemonade` as an argument — only the installer sets it.
+
+All LLM traffic routes through the LiteLLM proxy, which delegates to the Lemonade SDK (`lemonade-server`). The dashboard API uses a distinct `/api/v1` URL prefix in this mode (instead of `/v1`).
+
+| Aspect | Details |
+|--------|---------|
+| **LLM** | Lemonade SDK via LiteLLM proxy |
+| **Cost** | $0 (local inference) |
+| **Requires** | AMD GPU (auto-detected at install time) |
+| **Set by** | Installer (Phase 06), not `dream mode` |
+
+For AMD Strix Halo performance tuning (GRUB, kernel module, sysctl settings), see [`config/system-tuning/README.md`](../config/system-tuning/README.md).
+
 ---
 
 ## .env Variables
 
 | Variable | Default | Description |
 |----------|---------|-------------|
-| `DREAM_MODE` | `local` | Active mode: `local`, `cloud`, or `hybrid` |
+| `DREAM_MODE` | `local` | Active mode: `local`, `cloud`, or `hybrid`; `lemonade` is auto-set on AMD (not user-switchable) |
 | `LLM_API_URL` | `http://llama-server:8080` | Where services send LLM requests |
 | `ANTHROPIC_API_KEY` | *(empty)* | Anthropic API key (cloud/hybrid) |
 | `OPENAI_API_KEY` | *(empty)* | OpenAI API key (cloud/hybrid) |
@@ -177,14 +192,14 @@ User -> Open WebUI -> LiteLLM -> llama-server (local) -> Response
 
 ## Mode Comparison
 
-| Feature | Local | Cloud | Hybrid |
-|---------|-------|-------|--------|
-| Internet required | No | Yes | Yes (for fallback) |
-| API keys required | No | Yes | Recommended |
-| GPU required | Yes | No | Yes |
-| Response quality | Good | Best | Best of both |
-| Cost | $0 | $$$ | $0 or $$$ |
-| Privacy | 100% local | Data to cloud | Local unless fallback |
+| Feature | Local | Cloud | Hybrid | Lemonade (AMD) |
+|---------|-------|-------|--------|----------------|
+| Internet required | No | Yes | Yes (for fallback) | No |
+| API keys required | No | Yes | Recommended | No |
+| GPU required | Yes | No | Yes | Yes (AMD) |
+| Response quality | Good | Best | Best of both | Good |
+| Cost | $0 | $$$ | $0 or $$$ | $0 |
+| Privacy | 100% local | Data to cloud | Local unless fallback | 100% local |
 
 ---
 

diff --git a/dream-server/docs/POST-INSTALL-CHECKLIST.md b/dream-server/docs/POST-INSTALL-CHECKLIST.md
@@ -1,21 +1,56 @@
-# Dream Server Post-Install Checklist
-
-## llama-server
-- [ ] Verify llama-server is running
-- [ ] Check llama-server logs for any errors
-- [ ] Test basic functionality of llama-server
-
-## Whisper
-- [ ] Verify Whisper is installed
-- [ ] Check Whisper logs for any errors
-- [ ] Test Whisper transcription with sample audio
-
-## TTS
-- [ ] Verify TTS is installed
-- [ ] Check TTS logs for any errors
-- [ ] Test TTS with sample text
-
-## OpenClaw
-- [ ] Verify OpenClaw is running
-- [ ] Check OpenClaw logs for any errors
-- [ ] Test basic functionality of OpenClaw
+# Dream Server — Post-Install Checklist
+
+Run these checks after installation to confirm everything is working.
+
+---
+
+## 1. Overall health
+
+```bash
+dream status
+```
+
+Shows container status, service health checks, and GPU metrics in one view. All enabled services should report **healthy**. If any show as not responding, check the logs (step 6 below).
+
+## 2. LLM response test
+
+```bash
+dream chat "Hello, are you working?"
+```
+
+You should receive a text response within a few seconds. If you see an error, check `dream logs llm`.
+
+## 3. Web interface
+
+Open your browser and navigate to the address shown at the end of installation (default: `http://localhost:3000`). The Open WebUI chat interface should load and let you send a message.
+
+## 4. GPU verification
+
+**NVIDIA** — GPU utilisation, VRAM, and temperature appear automatically in `dream status`.
+
+**AMD:**
+```bash
+rocm-smi
+```
+
+**Apple Silicon** — GPU is used automatically; no separate check needed.
+
+## 5. Check enabled services
+
+```bash
+dream list
+```
+
+Core services (llama-server, open-webui, dashboard) should be shown as running. Optional services selected during install should also appear.
+
+## 6. Diagnose a failing service
+
+```bash
+dream logs <service>     # e.g. dream logs llm
+```
+
+Replace `<service>` with the name from `dream list`. Common aliases: `llm` for llama-server, `stt` for Whisper, `tts` for Kokoro.
+
+---
+
+If a service fails its health check after reviewing logs, see [TROUBLESHOOTING.md](TROUBLESHOOTING.md).
diff --git a/dream-server/docs/SUPPORT-MATRIX.md b/dream-server/docs/SUPPORT-MATRIX.md
@@ -78,3 +78,4 @@ Last updated: 2026-03-17
 ## See also
 
 - [LINUX-PORTABILITY.md](LINUX-PORTABILITY.md) — Linux installer edge cases, `.env` validation, extension manifests.
+- [config/system-tuning/README.md](../config/system-tuning/README.md) — Performance tuning for AMD Strix Halo (GRUB, modprobe, sysctl, CPU governor settings).
Original file line number	Diff line number	Diff line change
Expand Up		@@ -78,3 +78,4 @@ Last updated: 2026-03-17
		## See also

		- [LINUX-PORTABILITY.md](LINUX-PORTABILITY.md) — Linux installer edge cases, `.env` validation, extension manifests.
		- [config/system-tuning/README.md](../config/system-tuning/README.md) — Performance tuning for AMD Strix Halo (GRUB, modprobe, sysctl, CPU governor settings).