Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ The installer detects your GPU and picks the optimal model automatically. No man
| Unified RAM | Model | Example Hardware |
|-------------|-------|-----------------|
| < 16 GB | Qwen3.5 2B (Q4_K_M) | M1/M2 base (8GB) |
| 16–24 GB | Qwen3.5 4B (Q4_K_M) | M4 Mac Mini (16GB) |
| 16–24 GB | Qwen3.5 9B (Q4_K_M) | M4 Mac Mini (16GB) |
| 32 GB | Qwen3.5 9B (Q4_K_M) | M4 Pro Mac Mini, M3 Max MacBook Pro |
| 48 GB | Qwen3 30B-A3B (MoE, Q4_K_M) | M4 Pro (48GB), M2 Max (48GB) |
| 64+ GB | Qwen3 30B-A3B (MoE, Q4_K_M) | M2 Ultra Mac Studio, M4 Max (64GB+) |
Expand Down
3 changes: 3 additions & 0 deletions dream-server/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -261,6 +261,9 @@ LANGFUSE_INIT_USER_PASSWORD= # auto-generated during install
# llama-server memory limit (Docker)
# LLAMA_SERVER_MEMORY_LIMIT=64G

# llama-server CPU core limit (macOS/CPU-only mode — static default 8.0)
# Tune this to control how many CPU cores llama-server may use.
# LLAMA_CPU_LIMIT=8.0
#=== DreamForge (Local Agentic Coding) ===
# DREAMFORGE_IMAGE=ghcr.io/light-heart-labs/dreamforge:v0.1.0
# DREAMFORGE_PORT=3010
Expand Down
20 changes: 10 additions & 10 deletions dream-server/QUICKSTART.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,10 @@ The installer will:
- SH_LARGE (90GB+): qwen3-coder-next (80B MoE), 128K context
- SH_COMPACT (64-89GB): qwen3-30b-a3b (30B MoE), 128K context
- **NVIDIA (discrete GPU)**:
- Tier 1 (Entry): <12GB VRAM → qwen2.5-7b-instruct (GGUF Q4_K_M), 16K context
- Tier 2 (Prosumer): 12-20GB VRAM → qwen2.5-14b-instruct (GGUF Q4_K_M), 16K context
- Tier 3 (Pro): 20-40GB VRAM → qwen2.5-32b-instruct (GGUF Q4_K_M), 32K context
- Tier 4 (Enterprise): 40GB+ VRAM → qwen2.5-72b-instruct (GGUF Q4_K_M), 32K context
- Tier 1 (Entry): <12GB VRAM → qwen3.5-9b (GGUF Q4_K_M), 16K context
- Tier 2 (Prosumer): 12-20GB VRAM → qwen3.5-9b (GGUF Q4_K_M), 32K context
- Tier 3 (Pro): 20-40GB VRAM → qwen3-30b-a3b (GGUF Q4_K_M), 32K context
- Tier 4 (Enterprise): 40GB+ VRAM → qwen3-30b-a3b (GGUF Q4_K_M), 128K context
2. Check Docker and GPU toolkit (NVIDIA Container Toolkit or ROCm devices)
3. Ask which optional components to enable (voice, workflows, RAG)
4. Generate secure passwords and configuration
Expand Down Expand Up @@ -100,7 +100,7 @@ Visit: **http://localhost:3000**
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-32b-instruct",
"model": "qwen3-30b-a3b",
"messages": [{"role": "user", "content": "Hello!"}]
}'
```
Expand Down Expand Up @@ -132,10 +132,10 @@ The installer auto-detects your GPU and selects the optimal configuration:

| Tier | VRAM | Model | Example GPUs |
|------|------|-------|--------------|
| 1 (Entry) | <12GB | Qwen2.5-7B | RTX 3080, RTX 4070 |
| 2 (Prosumer) | 12-20GB | Qwen2.5-14B (GGUF Q4_K_M) | RTX 3090, RTX 4080 |
| 3 (Pro) | 20-40GB | Qwen2.5-32B (GGUF Q4_K_M) | RTX 4090, A6000 |
| 4 (Enterprise) | 40GB+ | Qwen2.5-72B (GGUF Q4_K_M) | A100, H100 |
| 1 (Entry) | <12GB | qwen3.5-9b (GGUF Q4_K_M) | RTX 3080, RTX 4070 |
| 2 (Prosumer) | 12-20GB | qwen3.5-9b (GGUF Q4_K_M) | RTX 3090, RTX 4080 |
| 3 (Pro) | 20-40GB | qwen3-30b-a3b (GGUF Q4_K_M) | RTX 4090, A6000 |
| 4 (Enterprise) | 40GB+ | qwen3-30b-a3b (GGUF Q4_K_M) | A100, H100 |

To check what tier you'd get without installing:

Expand All @@ -156,7 +156,7 @@ CTX_SIZE=4096 # or even 2048

Or switch to a smaller model:
```
LLM_MODEL=qwen2.5-7b-instruct
LLM_MODEL=qwen3.5-9b
```

### AMD: llama-server crash loop
Expand Down
14 changes: 7 additions & 7 deletions dream-server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,10 +130,10 @@ Both tiers use `qwen2.5:7b` as a bootstrap model for instant startup. The full m
| Tier | VRAM | Model | Quant | Context | Example GPUs |
|------|------|-------|-------|---------|--------------|
| NV_ULTRA | 90GB+ | qwen3-coder-next | GGUF Q4_K_M | 128K | Multi-GPU A100/H100 |
| 1 (Entry) | <12GB | qwen2.5-7b-instruct | GGUF Q4_K_M | 16K | RTX 3080, RTX 4070 |
| 2 (Prosumer) | 12-20GB | qwen2.5-14b-instruct | GGUF Q4_K_M | 16K | RTX 3090, RTX 4080 |
| 3 (Pro) | 20-40GB | qwen2.5-32b-instruct | GGUF Q4_K_M | 32K | RTX 4090, A6000 |
| 4 (Enterprise) | 40GB+ | qwen2.5-72b-instruct | GGUF Q4_K_M | 32K | A100, H100, multi-GPU |
| 1 (Entry) | <12GB | qwen3.5-9b | GGUF Q4_K_M | 16K | RTX 3080, RTX 4070 |
| 2 (Prosumer) | 12-20GB | qwen3.5-9b | GGUF Q4_K_M | 32K | RTX 3090, RTX 4080 |
| 3 (Pro) | 20-40GB | qwen3-30b-a3b | GGUF Q4_K_M | 32K | RTX 4090, A6000 |
| 4 (Enterprise) | 40GB+ | qwen3-30b-a3b | GGUF Q4_K_M | 128K | A100, H100, multi-GPU |

### Apple Silicon (Unified Memory, Metal)

Expand All @@ -142,7 +142,7 @@ Both tiers use `qwen2.5:7b` as a bootstrap model for instant startup. The full m
| 1 (Entry) | 8–24GB | qwen3.5-9b | GGUF Q4_K_M | 16K | M1/M2 base, M4 Mac Mini (16GB) |
| 2 (Prosumer) | 32GB | qwen3.5-9b | GGUF Q4_K_M | 32K | M4 Pro Mac Mini, M3 Max MacBook Pro |
| 3 (Pro) | 48GB | qwen3-30b-a3b | GGUF Q4_K_M | 32K | M4 Pro (48GB), M2 Max (48GB) |
| 4 (Enterprise) | 64GB+ | qwen3-30b-a3b (30B MoE) | GGUF Q4_K_M | 131K | M2 Ultra Mac Studio, M4 Max (64GB+) |
| 4 (Enterprise) | 64GB+ | qwen3-30b-a3b (30B MoE) | GGUF Q4_K_M | 128K | M2 Ultra Mac Studio, M4 Max (64GB+) |

Override with: `./install.sh --tier 3`

Expand Down Expand Up @@ -188,7 +188,7 @@ See [docs/HARDWARE-GUIDE.md](docs/HARDWARE-GUIDE.md) for buying recommendations.
┌─────────────────────▼───────────────────────────┐
│ llama-server (CUDA) │
│ (localhost:8080/v1/...) │
qwen2.5-32b-instruct
qwen3-30b-a3b
└─────────────────────────────────────────────────┘
│ │
┌────────▼────────┐ ┌───────▼────────┐
Expand Down Expand Up @@ -244,7 +244,7 @@ The installer generates `.env` automatically. Key settings:

```bash
# NVIDIA
LLM_MODEL=qwen2.5-32b-instruct # Model (auto-set by installer)
LLM_MODEL=qwen3-30b-a3b # Model (auto-set by installer)
CTX_SIZE=32768 # Context window

# AMD Strix Halo
Expand Down
24 changes: 24 additions & 0 deletions dream-server/SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,30 @@ sudo ufw allow from 192.168.0.0/24 to any port 3001 # Dashboard
sudo ufw allow from 192.168.0.0/24 to any port 8080 # LLM API
```

### Host Agent Network Binding

The host agent (`bin/dream-host-agent.py`) has its own bind address, separate from the Docker services above. It is controlled by `DREAM_AGENT_BIND` in `.env`:

| Platform | Default | Behavior |
|----------|---------|----------|
| macOS / Windows | `127.0.0.1` | Docker Desktop routes container traffic via loopback — loopback is sufficient |
| Linux | auto-detected | Detects the Docker bridge gateway IP (e.g. `172.17.0.1`) so containers can reach the agent; LAN devices cannot. Falls back to `127.0.0.1` if detection fails. |

To override the default, set `DREAM_AGENT_BIND` in `.env`:

```bash
# Restrict to loopback only (e.g. no-Docker Linux or extra hardening)
DREAM_AGENT_BIND=127.0.0.1

# Bind to Docker bridge only (explicit Linux default)
DREAM_AGENT_BIND=172.17.0.1

# Bind to all interfaces — exposes the host agent API on LAN (not recommended)
DREAM_AGENT_BIND=0.0.0.0
```

> **Note:** If you bind to `0.0.0.0`, ensure `DREAM_AGENT_KEY` is set in `.env` — it protects the extension management endpoints with Bearer token authentication.

### Exposing to Internet (Not Recommended)

If you must expose publicly, use a reverse proxy with TLS:
Expand Down
96 changes: 94 additions & 2 deletions dream-server/docs/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,10 +195,102 @@ Options:
### How do I get updates?

```bash
./dream-cli update
dream update
```

That's it. Updates are optional — you control when to apply them.
Updates are optional — you control when to apply them.

**Preview changes without applying:**
```bash
dream update --dry-run
```

**Skip version-compatibility confirmation:**
```bash
dream update --force
```

`dream update` automatically creates a pre-update snapshot before pulling new images, then verifies all services are healthy afterward. If something goes wrong, run:

```bash
dream rollback
```

This restores configuration from the pre-update snapshot and restarts services.

---

### How do I back up and restore my data?

**Create a backup** (saves user data and config to `.backups/`):
```bash
dream backup
```

**Create a compressed backup:**
```bash
dream backup -c
```

**List existing backups:**
```bash
dream backup -l
```

**Verify a backup's integrity:**
```bash
dream backup verify <backup_id>
```

**Restore from a backup** (interactive — lets you choose from available backups):
```bash
dream restore
```

**Restore a specific backup by ID:**
```bash
dream restore <backup_id>
```

**Rollback after a failed update** (restores the pre-update snapshot):
```bash
dream rollback
```

`dream update` always creates a pre-update snapshot, so `dream rollback` is available immediately after any update attempt.

---

### What are service templates?

Templates are curated presets that enable a group of extensions suited to a specific use case — for example, a creative-studio setup (image generation + voice) or a research workflow (RAG + web search + agents).

**List available templates:**
```bash
dream template list
```

**Preview what a template will change before applying:**
```bash
dream template preview <template-id>
```

**Apply a template (enables the template's services):**
```bash
dream template apply <template-id>
```

Applying a template only enables services — it doesn't disable anything you've already set up.

---

### Can I chat while models are downloading?

Yes. During install, a small bootstrap model (~1.5GB, Qwen 3.5 2B) downloads first so you can start chatting within a couple of minutes. The full tier-appropriate model downloads in the background.

When the full model finishes, the system swaps it in automatically — you don't need to do anything. `dream status` shows the current bootstrap state if a swap is still in progress.

---

### Where do I get help?

Expand Down
1 change: 1 addition & 0 deletions dream-server/docs/HOST-AGENT-API.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ The Dashboard API runs inside a Docker container and cannot directly run `docker
|----------|-----------|
| Linux | systemd user service (`scripts/systemd/dream-host-agent.service`) |
| macOS | Started by the installer (`installers/macos/install-macos.sh`) |
| Windows | Started by the installer (`installers/windows/phases/07-devtools.ps1`, managed via `dream.ps1`) |

The agent is started during installation (phase 07 on Linux) and binds to `127.0.0.1` only — it is not accessible from the network.

Expand Down
35 changes: 25 additions & 10 deletions dream-server/docs/MODE-SWITCH.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ dream restart

## How It Works

One env var (`LLM_API_URL`) controls where all services send LLM requests. Three modes set this automatically:
One env var (`LLM_API_URL`) controls where all services send LLM requests. Three modes are user-selectable via `dream mode`; a fourth (`lemonade`) is auto-configured by the installer on AMD hardware — see [Lemonade Mode](#lemonade-mode-amd--auto-configured) below.

| Mode | `LLM_API_URL` | `DREAM_MODE` | LiteLLM config |
|------|---------------|--------------|-----------------|
Expand Down Expand Up @@ -88,13 +88,28 @@ Local llama-server as primary, cloud APIs as fallback via LiteLLM.
dream mode hybrid
```

### Lemonade Mode (AMD — auto-configured)

**Not user-switchable.** This mode is automatically set by the installer on AMD hardware. `dream mode` does not accept `lemonade` as an argument — only the installer sets it.

All LLM traffic routes through the LiteLLM proxy, which delegates to the Lemonade SDK (`lemonade-server`). The dashboard API uses a distinct `/api/v1` URL prefix in this mode (instead of `/v1`).

| Aspect | Details |
|--------|---------|
| **LLM** | Lemonade SDK via LiteLLM proxy |
| **Cost** | $0 (local inference) |
| **Requires** | AMD GPU (auto-detected at install time) |
| **Set by** | Installer (Phase 06), not `dream mode` |

For AMD Strix Halo performance tuning (GRUB, kernel module, sysctl settings), see [`config/system-tuning/README.md`](../config/system-tuning/README.md).

---

## .env Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `DREAM_MODE` | `local` | Active mode: `local`, `cloud`, or `hybrid` |
| `DREAM_MODE` | `local` | Active mode: `local`, `cloud`, or `hybrid`; `lemonade` is auto-set on AMD (not user-switchable) |
| `LLM_API_URL` | `http://llama-server:8080` | Where services send LLM requests |
| `ANTHROPIC_API_KEY` | *(empty)* | Anthropic API key (cloud/hybrid) |
| `OPENAI_API_KEY` | *(empty)* | OpenAI API key (cloud/hybrid) |
Expand Down Expand Up @@ -177,14 +192,14 @@ User -> Open WebUI -> LiteLLM -> llama-server (local) -> Response

## Mode Comparison

| Feature | Local | Cloud | Hybrid |
|---------|-------|-------|--------|
| Internet required | No | Yes | Yes (for fallback) |
| API keys required | No | Yes | Recommended |
| GPU required | Yes | No | Yes |
| Response quality | Good | Best | Best of both |
| Cost | $0 | $$$ | $0 or $$$ |
| Privacy | 100% local | Data to cloud | Local unless fallback |
| Feature | Local | Cloud | Hybrid | Lemonade (AMD) |
|---------|-------|-------|--------|----------------|
| Internet required | No | Yes | Yes (for fallback) | No |
| API keys required | No | Yes | Recommended | No |
| GPU required | Yes | No | Yes | Yes (AMD) |
| Response quality | Good | Best | Best of both | Good |
| Cost | $0 | $$$ | $0 or $$$ | $0 |
| Privacy | 100% local | Data to cloud | Local unless fallback | 100% local |

---

Expand Down
77 changes: 56 additions & 21 deletions dream-server/docs/POST-INSTALL-CHECKLIST.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,56 @@
# Dream Server Post-Install Checklist

## llama-server
- [ ] Verify llama-server is running
- [ ] Check llama-server logs for any errors
- [ ] Test basic functionality of llama-server

## Whisper
- [ ] Verify Whisper is installed
- [ ] Check Whisper logs for any errors
- [ ] Test Whisper transcription with sample audio

## TTS
- [ ] Verify TTS is installed
- [ ] Check TTS logs for any errors
- [ ] Test TTS with sample text

## OpenClaw
- [ ] Verify OpenClaw is running
- [ ] Check OpenClaw logs for any errors
- [ ] Test basic functionality of OpenClaw
# Dream Server — Post-Install Checklist

Run these checks after installation to confirm everything is working.

---

## 1. Overall health

```bash
dream status
```

Shows container status, service health checks, and GPU metrics in one view. All enabled services should report **healthy**. If any show as not responding, check the logs (step 6 below).

## 2. LLM response test

```bash
dream chat "Hello, are you working?"
```

You should receive a text response within a few seconds. If you see an error, check `dream logs llm`.

## 3. Web interface

Open your browser and navigate to the address shown at the end of installation (default: `http://localhost:3000`). The Open WebUI chat interface should load and let you send a message.

## 4. GPU verification

**NVIDIA** — GPU utilisation, VRAM, and temperature appear automatically in `dream status`.

**AMD:**
```bash
rocm-smi
```

**Apple Silicon** — GPU is used automatically; no separate check needed.

## 5. Check enabled services

```bash
dream list
```

Core services (llama-server, open-webui, dashboard) should be shown as running. Optional services selected during install should also appear.

## 6. Diagnose a failing service

```bash
dream logs <service> # e.g. dream logs llm
```

Replace `<service>` with the name from `dream list`. Common aliases: `llm` for llama-server, `stt` for Whisper, `tts` for Kokoro.

---

If a service fails its health check after reviewing logs, see [TROUBLESHOOTING.md](TROUBLESHOOTING.md).
1 change: 1 addition & 0 deletions dream-server/docs/SUPPORT-MATRIX.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,3 +78,4 @@ Last updated: 2026-03-17
## See also

- [LINUX-PORTABILITY.md](LINUX-PORTABILITY.md) — Linux installer edge cases, `.env` validation, extension manifests.
- [config/system-tuning/README.md](../config/system-tuning/README.md) — Performance tuning for AMD Strix Halo (GRUB, modprobe, sysctl, CPU governor settings).
Loading
Loading