-# Lighthouse AI
+# Dream Server
-**Local AI infrastructure. Your hardware. Your data. Your rules.**
+**Your turnkey local AI stack. Buy hardware. Run installer. AI running.**
[](LICENSE)
-[](https://github.com/Light-Heart-Labs/Lighthouse-AI/stargazers)
-[](https://github.com/Light-Heart-Labs/Lighthouse-AI/releases)
-[](https://github.com/Light-Heart-Labs/Lighthouse-AI/actions)
+[](https://github.com/Light-Heart-Labs/DreamServer/stargazers)
+[](https://github.com/Light-Heart-Labs/DreamServer/releases)
+[](https://docs.docker.com/get-docker/)
---
-## Dream Server โ One Command, Full AI Stack
+## 5-Minute Quickstart
-One installer gets you from bare metal to a fully running local AI stack โ LLM inference, chat UI, voice agents, workflow automation, RAG, and privacy tools. No manual config. No dependency hell. No six months of piecing it together. Run one command, answer a few questions, everything works.
+```bash
+# One-line install (Linux/WSL)
+curl -fsSL https://raw.githubusercontent.com/Light-Heart-Labs/DreamServer/main/dream-server/get-dream-server.sh | bash
+```
+
+Or manually:
```bash
-curl -fsSL https://raw.githubusercontent.com/Light-Heart-Labs/Lighthouse-AI/main/dream-server/get-dream-server.sh | bash
+git clone https://github.com/Light-Heart-Labs/DreamServer.git
+cd DreamServer/dream-server
+./install.sh
```
-
-
-
- The installer detects your hardware, picks the optimal model, and asks how deep you want to go.
-
+The installer auto-detects your GPU, picks the right model, generates secure passwords, and starts everything. Open **http://localhost:3000** and start chatting.
----
+### ๐ Instant Start (Bootstrap Mode)
-## Dashboard
+By default, Dream Server uses **bootstrap mode** for instant gratification:
-Everything running, at a glance. GPU metrics, service health, one-click access to Chat, Voice, Workflows, Agents, and Documents.
+1. Starts immediately with a tiny 1.5B model (downloads in <1 minute)
+2. You can start chatting within **2 minutes** of running the installer
+3. The full model downloads in the background
+4. When ready, hot-swap to the full model with zero downtime
-
-
-
+No more staring at download bars. Start playing immediately.
----
+### Windows
-## Architecture
-
-```mermaid
-graph TB
- subgraph User[" You "]
- Browser(["Browser"])
- Mic(["Microphone"])
- API(["API Client"])
- end
-
- subgraph DreamServer["Dream Server (Docker Compose)"]
- subgraph Core["Core"]
- VLLM["vLLM ยท :8000 LLM Inference"]
- WebUI["Open WebUI ยท :3000 Chat Interface"]
- Dashboard["Dashboard ยท :3001 GPU Metrics"]
- end
-
- subgraph Voice["Voice"]
- Whisper["Whisper ยท :9000 Speech โ Text"]
- Kokoro["Kokoro ยท :8880 Text โ Speech"]
- LiveKit["LiveKit ยท :7880 WebRTC"]
- VoiceAgent["Voice Agent"]
- end
-
- subgraph RAGp["RAG"]
- Qdrant["Qdrant ยท :6333 Vector DB"]
- Embeddings["Embeddings ยท :8090"]
- end
-
- subgraph Workflows["Workflows"]
- N8N["n8n ยท :5678 400+ Integrations"]
- end
-
- subgraph Agents["Agents"]
- OpenClaw["OpenClaw ยท :7860 Multi-Agent"]
- ToolProxy["Tool Proxy vLLM Bridge"]
- end
-
- subgraph Privacy["Privacy"]
- Shield["Privacy Shield ยท :8085 PII Redaction"]
- end
- end
-
- Browser --> WebUI
- Browser --> Dashboard
- Browser --> N8N
- Mic --> LiveKit
- API --> VLLM
-
- WebUI --> VLLM
- VoiceAgent --> Whisper
- VoiceAgent --> Kokoro
- VoiceAgent --> VLLM
- LiveKit --> VoiceAgent
- OpenClaw --> ToolProxy
- ToolProxy --> VLLM
- Shield -.->|PII scrubbed| VLLM
-
- style Core fill:#e8f0fe,stroke:#1a73e8,color:#1a1a1a
- style Voice fill:#fce8e6,stroke:#d93025,color:#1a1a1a
- style RAGp fill:#e6f4ea,stroke:#1e8e3e,color:#1a1a1a
- style Workflows fill:#fef7e0,stroke:#f9ab00,color:#1a1a1a
- style Agents fill:#f3e8fd,stroke:#9334e6,color:#1a1a1a
- style Privacy fill:#e8eaed,stroke:#5f6368,color:#1a1a1a
+```powershell
+# Download and run
+Invoke-WebRequest -Uri "https://raw.githubusercontent.com/Light-Heart-Labs/DreamServer/main/install.ps1" -OutFile install.ps1
+.\install.ps1
```
-The installer auto-detects your GPU and activates the right profiles. Core services start immediately; voice, RAG, workflows, and agents activate based on your hardware and preferences.
+Windows installer checks prerequisites (WSL2, Docker, NVIDIA), then delegates to the Linux install path.
---
-## Who Is This For?
-
-**Hobbyists** โ Want local ChatGPT without subscriptions? Install Dream Server, open `localhost:3000`, start chatting. Voice mode, document Q&A, and workflow automation are one toggle away.
-
-**Developers** โ Building AI agents? Dream Server gives you a local OpenAI-compatible API (vLLM), multi-agent coordination (OpenClaw), and a workflow engine (n8n) โ all on your GPU. No API keys, no rate limits, no cost per token.
+## What You Get
-**Teams** โ Need private AI infrastructure? Everything runs on your hardware. The Privacy Shield scrubs PII before anything leaves your network. Deploy once, use from any device on your LAN.
+One installer. Full AI stack. Zero config.
+
+| Component | Purpose | Port |
+|-----------|---------|------|
+| **llama-server** | LLM inference engine with continuous batching | 8080 |
+| **Open WebUI** | Beautiful chat interface with history & web search | 3000 |
+| **Dashboard** | Real-time GPU metrics, service health, model management | 3001 |
+| **LiteLLM** | Multi-model API gateway | 4000 |
+| **OpenClaw** | Autonomous AI agent framework | 7860 |
+| **SearXNG** | Self-hosted web search | 8888 |
+| **Perplexica** | Deep research engine | 3004 |
+| **n8n** | Workflow automation (400+ integrations) | 5678 |
+| **Qdrant** | Vector database for RAG | 6333 |
+| **Whisper** | Speech-to-text | 9000 |
+| **Kokoro** | Text-to-speech | 8880 |
+| **ComfyUI** | Image generation | 8188 |
+| **Privacy Shield** | PII scrubbing proxy | 8085 |
---
-## What You Get
+## Hardware Support
-| Component | What It Does |
-|-----------|-------------|
-| **vLLM** | GPU-accelerated LLM inference with continuous batching โ auto-selects 7B to 72B models for your hardware |
-| **Open WebUI** | Full-featured chat interface with conversation history, model switching, web search |
-| **Dashboard** | Real-time GPU metrics (VRAM, temp, utilization), service health, model management |
-| **Whisper** | Speech-to-text โ local, fast, private |
-| **Kokoro** | Text-to-speech โ natural-sounding voices, no cloud |
-| **LiveKit** | Real-time WebRTC voice conversations โ talk to your AI like a phone call |
-| **n8n** | Visual workflow automation with 400+ integrations (GitHub, Slack, email, webhooks) |
-| **Qdrant** | Vector database for document Q&A (RAG) |
-| **OpenClaw** | Multi-agent AI framework โ agents coordinating autonomously on your GPU |
-| **Privacy Shield** | PII redaction proxy โ scrubs personal data before any external API call |
-
-### Hardware Tiers (Auto-Detected)
+The installer **automatically detects your GPU** and selects the optimal configuration:
+
+### NVIDIA GPUs
| Tier | VRAM | Model | Example GPUs |
|------|------|-------|--------------|
-| Entry | <12GB | Qwen2.5-7B | RTX 3080, RTX 4070 |
-| Prosumer | 12โ20GB | Qwen2.5-14B-AWQ | RTX 3090, RTX 4080 |
-| Pro | 20โ40GB | Qwen2.5-32B-AWQ | RTX 4090, A6000 |
-| Enterprise | 40GB+ | Qwen2.5-72B-AWQ | A100, H100, multi-GPU |
-
-**Bootstrap mode:** Chat in 2 minutes. A tiny model starts instantly while the full model downloads in the background. Hot-swap with zero downtime when ready.
+| Tier 1 | 8-11GB | qwen2.5-7b-instruct (Q4_K_M) | RTX 4060 Ti, RTX 3060 12GB |
+| Tier 2 | 12-15GB | qwen2.5-14b-instruct (Q4_K_M) | RTX 3080 12GB, RTX 4070 Ti |
+| Tier 3 | 16-23GB | qwen2.5-32b-instruct (Q4_K_M) | RTX 4090, RTX 3090, A5000 |
+| Tier 4 | 24GB+ | qwen2.5-72b-instruct (Q4_K_M) | 2x RTX 4090, A100 |
-### How It Compares
+### AMD APUs (Strix Halo)
-| | Dream Server | Ollama + Open WebUI | LocalAI |
-|---|:---:|:---:|:---:|
-| Full-stack install (LLM + voice + workflows + RAG + privacy) | **One command** | Manual assembly | Manual assembly |
-| Hardware auto-detection + model selection | **Yes** | No | No |
-| Voice agents (STT + TTS + WebRTC) | **Built in** | No | Partial |
-| Inference engine | **vLLM** (continuous batching) | llama.cpp | llama.cpp |
-| Workflow automation | **n8n (400+ integrations)** | No | No |
-| PII redaction | **Built in** | No | No |
-| Multi-agent framework | **OpenClaw** | No | No |
+| Tier | Unified Memory | Model | Hardware |
+|------|---------------|-------|----------|
+| SH_LARGE | 90GB+ | qwen3-coder-next (80B MoE) | Ryzen AI MAX+ 395 (96GB) |
+| SH_COMPACT | 64-89GB | qwen3-30b-a3b (30B MoE) | Ryzen AI MAX+ 395 (64GB) |
-Ollama is great for running models locally. Dream Server is a complete AI platform โ inference, voice, workflows, RAG, agents, privacy, and monitoring in one installer.
+All models auto-selected based on available VRAM. No manual configuration.
---
-## Operations Toolkit
+## Documentation
+
+| | |
+|---|---|
+| [**Quickstart**](dream-server/QUICKSTART.md) | Step-by-step install guide with troubleshooting |
+| [**FAQ**](dream-server/FAQ.md) | Common questions, hardware advice, configuration |
+| [**Changelog**](dream-server/CHANGELOG.md) | Version history and release notes |
+| [**Contributing**](dream-server/CONTRIBUTING.md) | How to contribute to Dream Server |
+| [**Architecture**](dream-server/docs/INSTALLER-ARCHITECTURE.md) | Modular installer design deep dive |
+| [**Extensions**](dream-server/docs/EXTENSIONS.md) | How to add custom services |
+
+---
-Standalone tools for running persistent AI agents in production. Each works independently โ grab what you need.
+## Repository Structure
-| Tool | Purpose |
-|------|---------|
-| [**Guardian**](guardian/) | Self-healing process watchdog โ monitors services, auto-restores from backup, runs as root so agents can't kill it |
-| [**Memory Shepherd**](memory-shepherd/) | Periodic memory reset to prevent identity drift in long-running agents |
-| [**Token Spy**](token-spy/) | API cost monitoring with real-time dashboard and auto-kill for runaway sessions |
-| [**vLLM Tool Proxy**](dream-server/vllm-tool-proxy/) | Makes local vLLM tool calling work with OpenClaw โ SSE re-wrapping, extraction, loop protection |
-| [**LLM Cold Storage**](scripts/llm-cold-storage.sh) | Archives idle HuggingFace models to free disk, keeps them resolvable via symlink |
+```
+DreamServer/
+โโโ dream-server/ # v2.0.0 - Production-ready local AI stack
+โ โโโ install.sh # Linux/WSL installer
+โ โโโ docker-compose.*.yml
+โ โโโ installers/ # Modular installer (13 phases)
+โ โโโ extensions/ # Drop-in service integrations
+โ โโโ docs/ # 30+ documentation files
+โ
+โโโ install.sh # Root installer (delegates to dream-server/)
+โโโ install.ps1 # Windows installer
+โ
+โโโ archive/ # Legacy projects (reference only)
+ โโโ guardian/ # Process watchdog
+ โโโ memory-shepherd/ # Agent memory lifecycle
+ โโโ token-spy/ # API cost monitoring
+ โโโ docs/ # Historical documentation
+```
-These tools were born from the [OpenClaw Collective](COLLECTIVE.md) โ 3 AI agents running autonomously on local GPUs, producing 3,464 commits in 8 days. Dream Server packages the infrastructure they built into something anyone can use.
+**Shipping:** `dream-server/` is the v2.0.0 release.
+**Archive:** Legacy tools from the [OpenClaw Collective](archive/COLLECTIVE.md) development period.
---
-## Documentation
+## What's New in v2.0.0
-| | |
-|---|---|
-| [**Quickstart**](dream-server/QUICKSTART.md) | Step-by-step install guide with troubleshooting |
-| [**FAQ**](dream-server/FAQ.md) | Common questions, hardware advice, configuration |
-| [**Hardware Guide**](dream-server/docs/HARDWARE-GUIDE.md) | GPU recommendations with real prices |
-| [**Cookbook**](docs/cookbook/) | Recipes: voice agents, RAG pipelines, code assistant, privacy proxy |
-| [**Architecture**](docs/ARCHITECTURE.md) | Deep dive into the system design |
-| [**Contributing**](CONTRIBUTING.md) | How to contribute to Lighthouse AI |
+- **Modular installer**: 2591-line monolith โ 6 libraries + 13 phases
+- **Zero-config service discovery**: Extensions auto-register via manifests
+- **AMD Strix Halo support**: ROCm 6.3 with unified memory models
+- **Bootstrap mode**: Chat in 2 minutes, upgrade later
+- **Comprehensive testing**: `make gate` runs lint + test + smoke + simulate
+- **30+ docs**: Installation, troubleshooting, Windows guides, extensions
-Windows: [`install.ps1`](dream-server/README.md#windows) handles WSL2 + Docker + NVIDIA drivers automatically.
+See [`dream-server/CHANGELOG.md`](dream-server/CHANGELOG.md) for full release notes.
---
## License
-Apache 2.0 โ see [LICENSE](LICENSE). Use it, modify it, ship it.
+Apache 2.0 โ Use it, modify it, ship it. See [LICENSE](LICENSE).
+
+---
-Built by [Lightheart Labs](https://github.com/Light-Heart-Labs) and the [OpenClaw Collective](COLLECTIVE.md).
+*Built by [The Collective](https://github.com/Light-Heart-Labs/DreamServer) โ Android-17, Todd, and friends*
diff --git a/COLLECTIVE.md b/archive/COLLECTIVE.md
similarity index 100%
rename from COLLECTIVE.md
rename to archive/COLLECTIVE.md
diff --git a/RELEASE-v1.0.0.md b/archive/RELEASE-v1.0.0.md
similarity index 100%
rename from RELEASE-v1.0.0.md
rename to archive/RELEASE-v1.0.0.md
diff --git a/compose/.env.example b/archive/compose/.env.example
similarity index 100%
rename from compose/.env.example
rename to archive/compose/.env.example
diff --git a/compose/docker-compose.nano.yml b/archive/compose/docker-compose.nano.yml
similarity index 100%
rename from compose/docker-compose.nano.yml
rename to archive/compose/docker-compose.nano.yml
diff --git a/compose/docker-compose.pro.yml b/archive/compose/docker-compose.pro.yml
similarity index 100%
rename from compose/docker-compose.pro.yml
rename to archive/compose/docker-compose.pro.yml
diff --git a/config.yaml b/archive/config.yaml
similarity index 100%
rename from config.yaml
rename to archive/config.yaml
diff --git a/configs/models.json b/archive/configs/models.json
similarity index 100%
rename from configs/models.json
rename to archive/configs/models.json
diff --git a/configs/openclaw-gateway.service b/archive/configs/openclaw-gateway.service
similarity index 100%
rename from configs/openclaw-gateway.service
rename to archive/configs/openclaw-gateway.service
diff --git a/configs/openclaw.json b/archive/configs/openclaw.json
similarity index 100%
rename from configs/openclaw.json
rename to archive/configs/openclaw.json
diff --git a/docs/ARCHITECTURE.md b/archive/docs/ARCHITECTURE.md
similarity index 100%
rename from docs/ARCHITECTURE.md
rename to archive/docs/ARCHITECTURE.md
diff --git a/docs/DESIGN-DECISIONS.md b/archive/docs/DESIGN-DECISIONS.md
similarity index 100%
rename from docs/DESIGN-DECISIONS.md
rename to archive/docs/DESIGN-DECISIONS.md
diff --git a/docs/GUARDIAN.md b/archive/docs/GUARDIAN.md
similarity index 100%
rename from docs/GUARDIAN.md
rename to archive/docs/GUARDIAN.md
diff --git a/docs/MULTI-AGENT-PATTERNS.md b/archive/docs/MULTI-AGENT-PATTERNS.md
similarity index 100%
rename from docs/MULTI-AGENT-PATTERNS.md
rename to archive/docs/MULTI-AGENT-PATTERNS.md
diff --git a/docs/OPERATIONAL-LESSONS.md b/archive/docs/OPERATIONAL-LESSONS.md
similarity index 100%
rename from docs/OPERATIONAL-LESSONS.md
rename to archive/docs/OPERATIONAL-LESSONS.md
diff --git a/docs/PATTERNS.md b/archive/docs/PATTERNS.md
similarity index 100%
rename from docs/PATTERNS.md
rename to archive/docs/PATTERNS.md
diff --git a/docs/PHILOSOPHY.md b/archive/docs/PHILOSOPHY.md
similarity index 100%
rename from docs/PHILOSOPHY.md
rename to archive/docs/PHILOSOPHY.md
diff --git a/docs/SETUP.md b/archive/docs/SETUP.md
similarity index 100%
rename from docs/SETUP.md
rename to archive/docs/SETUP.md
diff --git a/docs/TOKEN-MONITOR-PRODUCT-SCOPE.md b/archive/docs/TOKEN-MONITOR-PRODUCT-SCOPE.md
similarity index 100%
rename from docs/TOKEN-MONITOR-PRODUCT-SCOPE.md
rename to archive/docs/TOKEN-MONITOR-PRODUCT-SCOPE.md
diff --git a/docs/TOKEN-SPY.md b/archive/docs/TOKEN-SPY.md
similarity index 100%
rename from docs/TOKEN-SPY.md
rename to archive/docs/TOKEN-SPY.md
diff --git a/docs/cookbook/01-voice-agent-setup.md b/archive/docs/cookbook/01-voice-agent-setup.md
similarity index 100%
rename from docs/cookbook/01-voice-agent-setup.md
rename to archive/docs/cookbook/01-voice-agent-setup.md
diff --git a/docs/cookbook/02-document-qa-setup.md b/archive/docs/cookbook/02-document-qa-setup.md
similarity index 100%
rename from docs/cookbook/02-document-qa-setup.md
rename to archive/docs/cookbook/02-document-qa-setup.md
diff --git a/docs/cookbook/03-code-assistant-setup.md b/archive/docs/cookbook/03-code-assistant-setup.md
similarity index 100%
rename from docs/cookbook/03-code-assistant-setup.md
rename to archive/docs/cookbook/03-code-assistant-setup.md
diff --git a/docs/cookbook/04-privacy-proxy-setup.md b/archive/docs/cookbook/04-privacy-proxy-setup.md
similarity index 100%
rename from docs/cookbook/04-privacy-proxy-setup.md
rename to archive/docs/cookbook/04-privacy-proxy-setup.md
diff --git a/docs/cookbook/05-multi-gpu-cluster.md b/archive/docs/cookbook/05-multi-gpu-cluster.md
similarity index 100%
rename from docs/cookbook/05-multi-gpu-cluster.md
rename to archive/docs/cookbook/05-multi-gpu-cluster.md
diff --git a/docs/cookbook/06-swarm-patterns.md b/archive/docs/cookbook/06-swarm-patterns.md
similarity index 100%
rename from docs/cookbook/06-swarm-patterns.md
rename to archive/docs/cookbook/06-swarm-patterns.md
diff --git a/docs/cookbook/08-n8n-local-llm.md b/archive/docs/cookbook/08-n8n-local-llm.md
similarity index 100%
rename from docs/cookbook/08-n8n-local-llm.md
rename to archive/docs/cookbook/08-n8n-local-llm.md
diff --git a/docs/cookbook/README.md b/archive/docs/cookbook/README.md
similarity index 100%
rename from docs/cookbook/README.md
rename to archive/docs/cookbook/README.md
diff --git a/docs/cookbook/agent-template-code.md b/archive/docs/cookbook/agent-template-code.md
similarity index 100%
rename from docs/cookbook/agent-template-code.md
rename to archive/docs/cookbook/agent-template-code.md
diff --git a/docs/images/dream-server-dashboard.png b/archive/docs/images/dream-server-dashboard.png
similarity index 100%
rename from docs/images/dream-server-dashboard.png
rename to archive/docs/images/dream-server-dashboard.png
diff --git a/docs/images/dream-server-install.png b/archive/docs/images/dream-server-install.png
similarity index 100%
rename from docs/images/dream-server-install.png
rename to archive/docs/images/dream-server-install.png
diff --git a/docs/research/GPU-TTS-BENCHMARK.md b/archive/docs/research/GPU-TTS-BENCHMARK.md
similarity index 100%
rename from docs/research/GPU-TTS-BENCHMARK.md
rename to archive/docs/research/GPU-TTS-BENCHMARK.md
diff --git a/docs/research/HARDWARE-GUIDE.md b/archive/docs/research/HARDWARE-GUIDE.md
similarity index 100%
rename from docs/research/HARDWARE-GUIDE.md
rename to archive/docs/research/HARDWARE-GUIDE.md
diff --git a/docs/research/OSS-MODEL-LANDSCAPE-2026-02.md b/archive/docs/research/OSS-MODEL-LANDSCAPE-2026-02.md
similarity index 100%
rename from docs/research/OSS-MODEL-LANDSCAPE-2026-02.md
rename to archive/docs/research/OSS-MODEL-LANDSCAPE-2026-02.md
diff --git a/docs/research/README.md b/archive/docs/research/README.md
similarity index 100%
rename from docs/research/README.md
rename to archive/docs/research/README.md
diff --git a/guardian/README.md b/archive/guardian/README.md
similarity index 100%
rename from guardian/README.md
rename to archive/guardian/README.md
diff --git a/guardian/docs/HEALTH-CHECKS.md b/archive/guardian/docs/HEALTH-CHECKS.md
similarity index 100%
rename from guardian/docs/HEALTH-CHECKS.md
rename to archive/guardian/docs/HEALTH-CHECKS.md
diff --git a/guardian/guardian.conf.example b/archive/guardian/guardian.conf.example
similarity index 100%
rename from guardian/guardian.conf.example
rename to archive/guardian/guardian.conf.example
diff --git a/guardian/guardian.service b/archive/guardian/guardian.service
similarity index 100%
rename from guardian/guardian.service
rename to archive/guardian/guardian.service
diff --git a/guardian/guardian.sh b/archive/guardian/guardian.sh
similarity index 100%
rename from guardian/guardian.sh
rename to archive/guardian/guardian.sh
diff --git a/guardian/install.sh b/archive/guardian/install.sh
similarity index 100%
rename from guardian/install.sh
rename to archive/guardian/install.sh
diff --git a/guardian/uninstall.sh b/archive/guardian/uninstall.sh
similarity index 100%
rename from guardian/uninstall.sh
rename to archive/guardian/uninstall.sh
diff --git a/memory-shepherd/README.md b/archive/memory-shepherd/README.md
similarity index 100%
rename from memory-shepherd/README.md
rename to archive/memory-shepherd/README.md
diff --git a/memory-shepherd/baselines/example-agent-MEMORY.md b/archive/memory-shepherd/baselines/example-agent-MEMORY.md
similarity index 100%
rename from memory-shepherd/baselines/example-agent-MEMORY.md
rename to archive/memory-shepherd/baselines/example-agent-MEMORY.md
diff --git a/memory-shepherd/docs/WRITING-BASELINES.md b/archive/memory-shepherd/docs/WRITING-BASELINES.md
similarity index 100%
rename from memory-shepherd/docs/WRITING-BASELINES.md
rename to archive/memory-shepherd/docs/WRITING-BASELINES.md
diff --git a/memory-shepherd/install.sh b/archive/memory-shepherd/install.sh
similarity index 100%
rename from memory-shepherd/install.sh
rename to archive/memory-shepherd/install.sh
diff --git a/memory-shepherd/memory-shepherd.conf.example b/archive/memory-shepherd/memory-shepherd.conf.example
similarity index 100%
rename from memory-shepherd/memory-shepherd.conf.example
rename to archive/memory-shepherd/memory-shepherd.conf.example
diff --git a/memory-shepherd/memory-shepherd.sh b/archive/memory-shepherd/memory-shepherd.sh
similarity index 100%
rename from memory-shepherd/memory-shepherd.sh
rename to archive/memory-shepherd/memory-shepherd.sh
diff --git a/memory-shepherd/uninstall.sh b/archive/memory-shepherd/uninstall.sh
similarity index 100%
rename from memory-shepherd/uninstall.sh
rename to archive/memory-shepherd/uninstall.sh
diff --git a/scripts/llm-cold-storage.sh b/archive/scripts/llm-cold-storage.sh
similarity index 100%
rename from scripts/llm-cold-storage.sh
rename to archive/scripts/llm-cold-storage.sh
diff --git a/scripts/session-cleanup.sh b/archive/scripts/session-cleanup.sh
similarity index 100%
rename from scripts/session-cleanup.sh
rename to archive/scripts/session-cleanup.sh
diff --git a/scripts/start-proxy.sh b/archive/scripts/start-proxy.sh
similarity index 100%
rename from scripts/start-proxy.sh
rename to archive/scripts/start-proxy.sh
diff --git a/scripts/start-vllm.sh b/archive/scripts/start-vllm.sh
similarity index 100%
rename from scripts/start-vllm.sh
rename to archive/scripts/start-vllm.sh
diff --git a/scripts/vllm-tool-proxy.py b/archive/scripts/vllm-tool-proxy.py
similarity index 100%
rename from scripts/vllm-tool-proxy.py
rename to archive/scripts/vllm-tool-proxy.py
diff --git a/systemd/llm-cold-storage.service b/archive/systemd/llm-cold-storage.service
similarity index 100%
rename from systemd/llm-cold-storage.service
rename to archive/systemd/llm-cold-storage.service
diff --git a/systemd/llm-cold-storage.timer b/archive/systemd/llm-cold-storage.timer
similarity index 100%
rename from systemd/llm-cold-storage.timer
rename to archive/systemd/llm-cold-storage.timer
diff --git a/systemd/openclaw-session-cleanup.service b/archive/systemd/openclaw-session-cleanup.service
similarity index 100%
rename from systemd/openclaw-session-cleanup.service
rename to archive/systemd/openclaw-session-cleanup.service
diff --git a/systemd/openclaw-session-cleanup.timer b/archive/systemd/openclaw-session-cleanup.timer
similarity index 100%
rename from systemd/openclaw-session-cleanup.timer
rename to archive/systemd/openclaw-session-cleanup.timer
diff --git a/systemd/token-spy@.service b/archive/systemd/token-spy@.service
similarity index 100%
rename from systemd/token-spy@.service
rename to archive/systemd/token-spy@.service
diff --git a/systemd/vllm-tool-proxy.service b/archive/systemd/vllm-tool-proxy.service
similarity index 100%
rename from systemd/vllm-tool-proxy.service
rename to archive/systemd/vllm-tool-proxy.service
diff --git a/token-spy/.env.example b/archive/token-spy/.env.example
similarity index 100%
rename from token-spy/.env.example
rename to archive/token-spy/.env.example
diff --git a/token-spy/README.md b/archive/token-spy/README.md
similarity index 100%
rename from token-spy/README.md
rename to archive/token-spy/README.md
diff --git a/token-spy/TOKEN-SPY-GUIDE.md b/archive/token-spy/TOKEN-SPY-GUIDE.md
similarity index 100%
rename from token-spy/TOKEN-SPY-GUIDE.md
rename to archive/token-spy/TOKEN-SPY-GUIDE.md
diff --git a/token-spy/db.py b/archive/token-spy/db.py
similarity index 100%
rename from token-spy/db.py
rename to archive/token-spy/db.py
diff --git a/token-spy/db_postgres.py b/archive/token-spy/db_postgres.py
similarity index 100%
rename from token-spy/db_postgres.py
rename to archive/token-spy/db_postgres.py
diff --git a/token-spy/main.py b/archive/token-spy/main.py
similarity index 100%
rename from token-spy/main.py
rename to archive/token-spy/main.py
diff --git a/token-spy/providers/__init__.py b/archive/token-spy/providers/__init__.py
similarity index 100%
rename from token-spy/providers/__init__.py
rename to archive/token-spy/providers/__init__.py
diff --git a/token-spy/providers/anthropic.py b/archive/token-spy/providers/anthropic.py
similarity index 100%
rename from token-spy/providers/anthropic.py
rename to archive/token-spy/providers/anthropic.py
diff --git a/token-spy/providers/base.py b/archive/token-spy/providers/base.py
similarity index 100%
rename from token-spy/providers/base.py
rename to archive/token-spy/providers/base.py
diff --git a/token-spy/providers/openai.py b/archive/token-spy/providers/openai.py
similarity index 100%
rename from token-spy/providers/openai.py
rename to archive/token-spy/providers/openai.py
diff --git a/token-spy/providers/registry.py b/archive/token-spy/providers/registry.py
similarity index 100%
rename from token-spy/providers/registry.py
rename to archive/token-spy/providers/registry.py
diff --git a/token-spy/requirements.txt b/archive/token-spy/requirements.txt
similarity index 100%
rename from token-spy/requirements.txt
rename to archive/token-spy/requirements.txt
diff --git a/token-spy/session-manager.sh b/archive/token-spy/session-manager.sh
similarity index 100%
rename from token-spy/session-manager.sh
rename to archive/token-spy/session-manager.sh
diff --git a/token-spy/start.sh b/archive/token-spy/start.sh
similarity index 100%
rename from token-spy/start.sh
rename to archive/token-spy/start.sh
diff --git a/workspace/IDENTITY.md b/archive/workspace/IDENTITY.md
similarity index 100%
rename from workspace/IDENTITY.md
rename to archive/workspace/IDENTITY.md
diff --git a/workspace/MEMORY.md b/archive/workspace/MEMORY.md
similarity index 100%
rename from workspace/MEMORY.md
rename to archive/workspace/MEMORY.md
diff --git a/workspace/SOUL.md b/archive/workspace/SOUL.md
similarity index 100%
rename from workspace/SOUL.md
rename to archive/workspace/SOUL.md
diff --git a/workspace/TOOLS.md b/archive/workspace/TOOLS.md
similarity index 100%
rename from workspace/TOOLS.md
rename to archive/workspace/TOOLS.md
diff --git a/dream-server/.env.example b/dream-server/.env.example
new file mode 100644
index 000000000..d1db9a334
--- /dev/null
+++ b/dream-server/.env.example
@@ -0,0 +1,137 @@
+# Dream Server Configuration
+# Copy this file to .env and edit values before starting:
+# cp .env.example .env
+#
+# The installer (install-core.sh) generates .env automatically with
+# secure random secrets. This file documents all available variables.
+
+# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
+# REQUIRED โ these must be set or docker compose will refuse to start
+# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
+
+# Session signing for Open WebUI (generate: openssl rand -hex 32)
+WEBUI_SECRET=CHANGEME
+
+# n8n workflow automation credentials
+N8N_USER=admin
+N8N_PASS=CHANGEME
+
+# LiteLLM API gateway key (generate: echo "sk-dream-$(openssl rand -hex 16)")
+LITELLM_KEY=CHANGEME
+
+# OpenClaw agent framework token (generate: openssl rand -hex 24)
+OPENCLAW_TOKEN=CHANGEME
+
+# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
+# LLM Backend Mode
+# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
+
+# local = llama-server (default, requires GPU or CPU inference)
+# cloud = LiteLLM -> cloud APIs (no local GPU needed)
+# hybrid = local primary, cloud fallback
+DREAM_MODE=local
+LLM_API_URL=http://llama-server:8080
+
+# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
+# Cloud API Keys (only needed for cloud/hybrid modes)
+# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
+
+ANTHROPIC_API_KEY=
+OPENAI_API_KEY=
+TOGETHER_API_KEY=
+
+# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
+# LLM Settings (llama-server)
+# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
+
+# Model GGUF filename (must exist in data/models/)
+GGUF_FILE=Qwen3-8B-Q4_K_M.gguf
+
+# Context window size (tokens)
+CTX_SIZE=16384
+
+# GPU backend: nvidia or amd
+GPU_BACKEND=nvidia
+
+# Model name (used by OpenClaw and dashboard)
+LLM_MODEL=qwen3-8b
+
+# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
+# Ports โ all overridable, defaults shown
+# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
+
+# OLLAMA_PORT=11434 # llama-server API (external โ internal 8080)
+# WEBUI_PORT=3000 # Open WebUI (external โ internal 8080)
+# SEARXNG_PORT=8888 # SearXNG metasearch (external โ internal 8080)
+# PERPLEXICA_PORT=3004 # Perplexica deep research (external โ internal 3000)
+# WHISPER_PORT=9000 # Whisper STT (external โ internal 8000)
+# TTS_PORT=8880 # Kokoro TTS (external โ internal 8880)
+# N8N_PORT=5678 # n8n workflows (external โ internal 5678)
+# QDRANT_PORT=6333 # Qdrant vector DB (external โ internal 6333)
+# QDRANT_GRPC_PORT=6334 # Qdrant gRPC (external โ internal 6334)
+# EMBEDDINGS_PORT=8090 # Text embeddings (external โ internal 80)
+# LITELLM_PORT=4000 # LiteLLM gateway (external โ internal 4000)
+# OPENCLAW_PORT=7860 # OpenClaw agent (external โ internal 18789)
+# SHIELD_PORT=8085 # Privacy Shield (external โ internal 8085)
+# DASHBOARD_API_PORT=3002 # Dashboard API (external โ internal 3002)
+# DASHBOARD_PORT=3001 # Dashboard UI (external โ internal 3001)
+# COMFYUI_PORT=8188 # ComfyUI image gen (external โ internal 8188)
+
+# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
+# Optional Security
+# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
+
+# Dashboard API key (generate: openssl rand -hex 32)
+# DASHBOARD_API_KEY=
+
+# Open WebUI authentication (true/false)
+# WEBUI_AUTH=true
+
+# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
+# Optional โ Voice, Web UI, n8n
+# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
+
+# Whisper model (tiny, base, small, medium, large-v3-turbo)
+# WHISPER_MODEL=base
+
+# System timezone (used by Open WebUI and n8n)
+# TIMEZONE=UTC
+
+# n8n settings
+# N8N_AUTH=true # Enable n8n basic auth
+# N8N_HOST=localhost # n8n hostname
+# N8N_WEBHOOK_URL=http://localhost:5678 # n8n webhook URL (for external access)
+
+# Embedding model for RAG
+# EMBEDDING_MODEL=BAAI/bge-base-en-v1.5
+
+# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
+# AMD-specific (only needed with GPU_BACKEND=amd)
+# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
+
+# VIDEO_GID=44 # `getent group video | cut -d: -f3`
+# RENDER_GID=992 # `getent group render | cut -d: -f3`
+
+# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
+# Advanced
+# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
+
+# Container user/group IDs
+# UID=1000
+# GID=1000
+
+# Privacy Shield settings
+# PII_CACHE_ENABLED=true
+# PII_CACHE_SIZE=1000
+# PII_CACHE_TTL=300
+# LOG_LEVEL=info
+
+# OpenClaw bootstrap model (small model for instant startup)
+# BOOTSTRAP_MODEL=qwen3:8b-q4_K_M
+
+# Dashboard API internal URLs (usually Docker-internal, not user-facing)
+# KOKORO_URL=http://tts:8880
+# N8N_URL=http://n8n:5678
+
+# llama-server memory limit (Docker)
+# LLAMA_SERVER_MEMORY_LIMIT=64G
diff --git a/dream-server/.env.schema.json b/dream-server/.env.schema.json
new file mode 100644
index 000000000..199f71229
--- /dev/null
+++ b/dream-server/.env.schema.json
@@ -0,0 +1,313 @@
+{
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
+ "title": "Dream Server Environment Configuration",
+ "description": "Schema for Dream Server .env file validation",
+ "type": "object",
+ "required": [
+ "WEBUI_SECRET",
+ "N8N_USER",
+ "N8N_PASS",
+ "LITELLM_KEY",
+ "OPENCLAW_TOKEN"
+ ],
+ "properties": {
+ "DREAM_MODE": {
+ "type": "string",
+ "description": "LLM backend mode: local, cloud, or hybrid",
+ "enum": ["local", "cloud", "hybrid"],
+ "default": "local"
+ },
+ "LLM_API_URL": {
+ "type": "string",
+ "description": "URL where all services send LLM requests",
+ "default": "http://llama-server:8080"
+ },
+ "ANTHROPIC_API_KEY": {
+ "type": "string",
+ "description": "Anthropic API key (cloud/hybrid modes)"
+ },
+ "OPENAI_API_KEY": {
+ "type": "string",
+ "description": "OpenAI API key (cloud/hybrid modes)"
+ },
+ "TOGETHER_API_KEY": {
+ "type": "string",
+ "description": "Together AI API key (optional)"
+ },
+ "WEBUI_SECRET": {
+ "type": "string",
+ "description": "Session signing secret for Open WebUI",
+ "secret": true
+ },
+ "N8N_USER": {
+ "type": "string",
+ "description": "n8n admin username"
+ },
+ "N8N_PASS": {
+ "type": "string",
+ "description": "n8n admin password",
+ "secret": true
+ },
+ "LITELLM_KEY": {
+ "type": "string",
+ "description": "LiteLLM API gateway master key",
+ "secret": true
+ },
+ "OPENCLAW_TOKEN": {
+ "type": "string",
+ "description": "OpenClaw agent framework token",
+ "secret": true
+ },
+ "GGUF_FILE": {
+ "type": "string",
+ "description": "Model GGUF filename in data/models/"
+ },
+ "CTX_SIZE": {
+ "type": "integer",
+ "description": "Context window size in tokens",
+ "default": 16384
+ },
+ "MAX_CONTEXT": {
+ "type": "integer",
+ "description": "Context window (installer variable, maps to CTX_SIZE)"
+ },
+ "GPU_BACKEND": {
+ "type": "string",
+ "description": "GPU backend: nvidia, amd, apple, or cpu",
+ "default": "nvidia"
+ },
+ "LLM_MODEL": {
+ "type": "string",
+ "description": "Model name used by OpenClaw and dashboard"
+ },
+ "TIER": {
+ "type": "string",
+ "description": "Hardware tier (1, 2, 3, 4, CLOUD, SH_COMPACT, SH_LARGE, NV_ULTRA)"
+ },
+ "OLLAMA_PORT": {
+ "type": "integer",
+ "description": "llama-server external port",
+ "default": 11434
+ },
+ "WEBUI_PORT": {
+ "type": "integer",
+ "description": "Open WebUI external port",
+ "default": 3000
+ },
+ "SEARXNG_PORT": {
+ "type": "integer",
+ "description": "SearXNG external port",
+ "default": 8888
+ },
+ "PERPLEXICA_PORT": {
+ "type": "integer",
+ "description": "Perplexica external port",
+ "default": 3004
+ },
+ "WHISPER_PORT": {
+ "type": "integer",
+ "description": "Whisper STT external port",
+ "default": 9000
+ },
+ "TTS_PORT": {
+ "type": "integer",
+ "description": "Kokoro TTS external port",
+ "default": 8880
+ },
+ "N8N_PORT": {
+ "type": "integer",
+ "description": "n8n external port",
+ "default": 5678
+ },
+ "QDRANT_PORT": {
+ "type": "integer",
+ "description": "Qdrant vector DB external port",
+ "default": 6333
+ },
+ "QDRANT_GRPC_PORT": {
+ "type": "integer",
+ "description": "Qdrant gRPC external port",
+ "default": 6334
+ },
+ "EMBEDDINGS_PORT": {
+ "type": "integer",
+ "description": "Text embeddings external port",
+ "default": 8090
+ },
+ "LITELLM_PORT": {
+ "type": "integer",
+ "description": "LiteLLM gateway external port",
+ "default": 4000
+ },
+ "OPENCLAW_PORT": {
+ "type": "integer",
+ "description": "OpenClaw agent external port",
+ "default": 7860
+ },
+ "SHIELD_PORT": {
+ "type": "integer",
+ "description": "Privacy Shield external port",
+ "default": 8085
+ },
+ "DASHBOARD_API_PORT": {
+ "type": "integer",
+ "description": "Dashboard API external port",
+ "default": 3002
+ },
+ "DASHBOARD_PORT": {
+ "type": "integer",
+ "description": "Dashboard UI external port",
+ "default": 3001
+ },
+ "COMFYUI_PORT": {
+ "type": "integer",
+ "description": "ComfyUI external port",
+ "default": 8188
+ },
+ "TOKEN_SPY_PORT": {
+ "type": "integer",
+ "description": "Token Spy external port",
+ "default": 3003
+ },
+ "LLAMA_SERVER_PORT": {
+ "type": "integer",
+ "description": "llama-server internal port",
+ "default": 8080
+ },
+ "DASHBOARD_API_KEY": {
+ "type": "string",
+ "description": "Dashboard API authentication key",
+ "secret": true
+ },
+ "OPENCODE_SERVER_PASSWORD": {
+ "type": "string",
+ "description": "OpenCode web UI authentication password",
+ "secret": true
+ },
+ "OPENCODE_PORT": {
+ "type": "integer",
+ "description": "OpenCode web UI external port",
+ "default": 3003
+ },
+ "WEBUI_AUTH": {
+ "type": "boolean",
+ "description": "Enable Open WebUI authentication",
+ "default": true
+ },
+ "WHISPER_MODEL": {
+ "type": "string",
+ "description": "Whisper STT model size",
+ "default": "base"
+ },
+ "TIMEZONE": {
+ "type": "string",
+ "description": "System timezone",
+ "default": "UTC"
+ },
+ "N8N_AUTH": {
+ "type": "boolean",
+ "description": "Enable n8n basic auth",
+ "default": true
+ },
+ "N8N_HOST": {
+ "type": "string",
+ "description": "n8n hostname",
+ "default": "localhost"
+ },
+ "N8N_WEBHOOK_URL": {
+ "type": "string",
+ "description": "n8n webhook URL for external access"
+ },
+ "EMBEDDING_MODEL": {
+ "type": "string",
+ "description": "Embedding model for RAG",
+ "default": "BAAI/bge-base-en-v1.5"
+ },
+ "VIDEO_GID": {
+ "type": "integer",
+ "description": "Video group ID (AMD only)"
+ },
+ "RENDER_GID": {
+ "type": "integer",
+ "description": "Render group ID (AMD only)"
+ },
+ "HSA_OVERRIDE_GFX_VERSION": {
+ "type": "string",
+ "description": "AMD ROCm GFX version override"
+ },
+ "ROCBLAS_USE_HIPBLASLT": {
+ "type": "integer",
+ "description": "AMD ROCm BLAS setting"
+ },
+ "UID": {
+ "type": "integer",
+ "description": "Container user ID",
+ "default": 1000
+ },
+ "GID": {
+ "type": "integer",
+ "description": "Container group ID",
+ "default": 1000
+ },
+ "PII_CACHE_ENABLED": {
+ "type": "boolean",
+ "description": "Privacy Shield PII cache",
+ "default": true
+ },
+ "PII_CACHE_SIZE": {
+ "type": "integer",
+ "description": "Privacy Shield PII cache size",
+ "default": 1000
+ },
+ "PII_CACHE_TTL": {
+ "type": "integer",
+ "description": "Privacy Shield PII cache TTL (seconds)",
+ "default": 300
+ },
+ "LOG_LEVEL": {
+ "type": "string",
+ "description": "Logging level",
+ "default": "info"
+ },
+ "BOOTSTRAP_MODEL": {
+ "type": "string",
+ "description": "OpenClaw bootstrap model (small, fast startup)"
+ },
+ "KOKORO_URL": {
+ "type": "string",
+ "description": "Kokoro TTS internal URL",
+ "default": "http://tts:8880"
+ },
+ "N8N_URL": {
+ "type": "string",
+ "description": "n8n internal URL",
+ "default": "http://n8n:5678"
+ },
+ "LLAMA_SERVER_MEMORY_LIMIT": {
+ "type": "string",
+ "description": "Docker memory limit for llama-server",
+ "default": "64G"
+ },
+ "LIVEKIT_API_KEY": {
+ "type": "string",
+ "description": "LiveKit API key"
+ },
+ "LIVEKIT_API_SECRET": {
+ "type": "string",
+ "description": "LiveKit API secret",
+ "secret": true
+ },
+ "ENABLE_WEB_SEARCH": {
+ "type": "boolean",
+ "description": "Enable web search in Open WebUI"
+ },
+ "WEB_SEARCH_ENGINE": {
+ "type": "string",
+ "description": "Web search engine backend"
+ },
+ "TTS_VOICE": {
+ "type": "string",
+ "description": "Text-to-speech voice"
+ }
+ }
+}
diff --git a/dream-server/.github/ISSUE_TEMPLATE/bug_report.md b/dream-server/.github/ISSUE_TEMPLATE/bug_report.md
new file mode 100644
index 000000000..88a41d6fa
--- /dev/null
+++ b/dream-server/.github/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,34 @@
+---
+name: Bug Report
+about: Something isn't working as expected
+labels: bug
+---
+
+**Hardware**
+- GPU: (e.g., RTX 4090 24GB, Strix Halo 96GB, none)
+- RAM:
+- OS: (e.g., Ubuntu 24.04, Windows 11 + WSL2, macOS 15)
+- Tier: (e.g., 2, SH_LARGE)
+
+**What happened?**
+A clear description of the bug.
+
+**What did you expect?**
+What should have happened instead.
+
+**Steps to reproduce**
+1.
+2.
+3.
+
+**Logs**
+```
+Paste relevant output from:
+ docker compose logs | tail -50
+ cat /tmp/dream-server-install.log | tail -50
+```
+
+**Installer version**
+```
+grep VERSION installers/lib/constants.sh
+```
diff --git a/dream-server/.github/ISSUE_TEMPLATE/feature_request.md b/dream-server/.github/ISSUE_TEMPLATE/feature_request.md
new file mode 100644
index 000000000..e5d5b0b94
--- /dev/null
+++ b/dream-server/.github/ISSUE_TEMPLATE/feature_request.md
@@ -0,0 +1,21 @@
+---
+name: Feature Request
+about: Suggest an improvement or new capability
+labels: enhancement
+---
+
+**What problem does this solve?**
+A clear description of the use case.
+
+**Proposed solution**
+How you'd like it to work.
+
+**Alternatives considered**
+Other approaches you've thought about.
+
+**Which area does this affect?**
+- [ ] Installer (tiers, phases, detection)
+- [ ] Docker services (compose, health checks)
+- [ ] Dashboard (UI, API, plugins)
+- [ ] Documentation
+- [ ] Other: ___
diff --git a/dream-server/.github/pull_request_template.md b/dream-server/.github/pull_request_template.md
new file mode 100644
index 000000000..c02d0bd28
--- /dev/null
+++ b/dream-server/.github/pull_request_template.md
@@ -0,0 +1,19 @@
+## Summary
+
+What does this PR do? (1-3 sentences)
+
+## Changes
+
+-
+
+## Testing
+
+- [ ] `bash -n` passes on all changed `.sh` files
+- [ ] `bash tests/test-tier-map.sh` passes (if tier/model changes)
+- [ ] `bash tests/integration-test.sh` passes
+- [ ] Relevant smoke tests pass (`tests/smoke/`)
+- [ ] Dashboard builds (if frontend changed): `cd dashboard && npm run build`
+
+## Related Issues
+
+Closes #
diff --git a/dream-server/.github/workflows/dashboard.yml b/dream-server/.github/workflows/dashboard.yml
new file mode 100644
index 000000000..71b80d9b5
--- /dev/null
+++ b/dream-server/.github/workflows/dashboard.yml
@@ -0,0 +1,47 @@
+name: Dashboard
+
+on:
+ pull_request:
+ push:
+ branches:
+ - main
+ - master
+
+jobs:
+ frontend:
+ runs-on: ubuntu-latest
+ defaults:
+ run:
+ working-directory: dashboard
+ steps:
+ - name: Checkout
+ uses: actions/checkout@v4
+
+ - name: Setup Node
+ uses: actions/setup-node@v4
+ with:
+ node-version: "20"
+
+ - name: Install Dependencies
+ run: npm install
+
+ - name: Lint
+ run: npm run lint
+
+ - name: Build
+ run: npm run build
+
+ api:
+ runs-on: ubuntu-latest
+ steps:
+ - name: Checkout
+ uses: actions/checkout@v4
+
+ - name: Setup Python
+ uses: actions/setup-python@v5
+ with:
+ python-version: "3.11"
+
+ - name: API Syntax Check
+ run: python -m py_compile dashboard-api/main.py dashboard-api/agent_monitor.py
+
diff --git a/dream-server/.github/workflows/lint-powershell.yml b/dream-server/.github/workflows/lint-powershell.yml
new file mode 100644
index 000000000..ed063ad25
--- /dev/null
+++ b/dream-server/.github/workflows/lint-powershell.yml
@@ -0,0 +1,40 @@
+name: Lint PowerShell
+
+on:
+ pull_request:
+ push:
+ branches:
+ - main
+ - master
+
+jobs:
+ powershell-lint:
+ runs-on: ubuntu-latest
+ steps:
+ - name: Checkout
+ uses: actions/checkout@v4
+
+ - name: Install PSScriptAnalyzer
+ shell: pwsh
+ run: |
+ Set-PSRepository PSGallery -InstallationPolicy Trusted
+ Install-Module PSScriptAnalyzer -Force -Scope CurrentUser
+
+ - name: Run PowerShell Script Analyzer
+ shell: pwsh
+ run: |
+ $scripts = Get-ChildItem -Path installers -Filter *.ps1 -Recurse
+ if (-not $scripts) {
+ Write-Host "No PowerShell scripts found."
+ exit 0
+ }
+ $failed = $false
+ foreach ($script in $scripts) {
+ Write-Host "Analyzing $($script.FullName)"
+ $results = Invoke-ScriptAnalyzer -Path $script.FullName -Settings ./PSScriptAnalyzerSettings.psd1 -Severity Error,Warning
+ if ($results) {
+ $results | Format-Table RuleName, Severity, Message, ScriptName, Line -AutoSize
+ $failed = $true
+ }
+ }
+ if ($failed) { exit 1 }
diff --git a/dream-server/.github/workflows/lint-shell.yml b/dream-server/.github/workflows/lint-shell.yml
new file mode 100644
index 000000000..41153fc44
--- /dev/null
+++ b/dream-server/.github/workflows/lint-shell.yml
@@ -0,0 +1,39 @@
+name: Lint Shell
+
+on:
+ pull_request:
+ push:
+ branches:
+ - main
+ - master
+
+jobs:
+ shell-syntax:
+ runs-on: ubuntu-latest
+ steps:
+ - name: Checkout
+ uses: actions/checkout@v4
+
+ - name: Bash Syntax Check
+ run: |
+ set -euo pipefail
+ mapfile -t files < <(git ls-files '*.sh')
+ if [ "${#files[@]}" -eq 0 ]; then
+ echo "No shell scripts found"
+ exit 0
+ fi
+ for f in "${files[@]}"; do
+ bash -n "$f"
+ done
+
+ - name: ShellCheck
+ run: |
+ set -euo pipefail
+ sudo apt-get -qq install -y shellcheck
+ mapfile -t files < <(git ls-files '*.sh')
+ if [ "${#files[@]}" -eq 0 ]; then
+ echo "No shell scripts found"
+ exit 0
+ fi
+ shellcheck -x -S warning "${files[@]}"
+
diff --git a/dream-server/.github/workflows/matrix-smoke.yml b/dream-server/.github/workflows/matrix-smoke.yml
new file mode 100644
index 000000000..8e444b420
--- /dev/null
+++ b/dream-server/.github/workflows/matrix-smoke.yml
@@ -0,0 +1,34 @@
+name: Matrix Smoke
+
+on:
+ pull_request:
+ push:
+ branches:
+ - main
+ - master
+
+jobs:
+ linux-smoke:
+ runs-on: ubuntu-latest
+ steps:
+ - name: Checkout
+ uses: actions/checkout@v4
+
+ - name: AMD Path Smoke
+ run: bash tests/smoke/linux-amd.sh
+
+ - name: NVIDIA Path Smoke
+ run: bash tests/smoke/linux-nvidia.sh
+
+ - name: WSL Logic Smoke
+ run: bash tests/smoke/wsl-logic.sh
+
+ macos-smoke:
+ runs-on: macos-latest
+ steps:
+ - name: Checkout
+ uses: actions/checkout@v4
+
+ - name: macOS Dispatch Smoke
+ run: bash tests/smoke/macos-dispatch.sh
+
diff --git a/dream-server/.github/workflows/test-linux.yml b/dream-server/.github/workflows/test-linux.yml
new file mode 100644
index 000000000..b3bcb176f
--- /dev/null
+++ b/dream-server/.github/workflows/test-linux.yml
@@ -0,0 +1,55 @@
+name: Test Linux
+
+on:
+ pull_request:
+ push:
+ branches:
+ - main
+ - master
+
+jobs:
+ integration-smoke:
+ runs-on: ubuntu-latest
+ steps:
+ - name: Checkout
+ uses: actions/checkout@v4
+
+ - name: Integration Smoke
+ run: bash tests/integration-test.sh
+
+ - name: Phase C P1 Static Checks
+ run: bash tests/test-phase-c-p1.sh
+
+ - name: Manifest Compatibility Checks
+ run: |
+ bash scripts/check-compatibility.sh
+ bash scripts/check-release-claims.sh
+
+ - name: Tier Map Unit Tests
+ run: bash tests/test-tier-map.sh
+
+ - name: Installer Contract Checks
+ run: |
+ bash tests/contracts/test-installer-contracts.sh
+ bash tests/contracts/test-preflight-fixtures.sh
+
+ - name: Installer Simulation Harness
+ run: |
+ bash scripts/simulate-installers.sh
+ test -f artifacts/installer-sim/summary.json
+ test -f artifacts/installer-sim/SUMMARY.md
+ python3 scripts/validate-sim-summary.py artifacts/installer-sim/summary.json
+
+ - name: Upload Installer Simulation Artifacts
+ uses: actions/upload-artifact@v4
+ with:
+ name: installer-sim
+ path: |
+ artifacts/installer-sim/summary.json
+ artifacts/installer-sim/SUMMARY.md
+ artifacts/installer-sim/linux-dryrun.log
+ artifacts/installer-sim/macos-installer.log
+ artifacts/installer-sim/windows-preflight-sim.json
+ artifacts/installer-sim/macos-preflight.json
+ artifacts/installer-sim/macos-doctor.json
+ artifacts/installer-sim/doctor.json
diff --git a/dream-server/.gitignore b/dream-server/.gitignore
index 072bdb1cd..c0a18762b 100644
--- a/dream-server/.gitignore
+++ b/dream-server/.gitignore
@@ -1,10 +1,25 @@
# Runtime / secrets
.env
.env.*
+!.env.example
+!.env.schema.json
+.current-mode
+.profiles
+.target-model
+.target-quantization
# Install-time data directories
data/
models/
+artifacts/
+logs/
+
+# User presets (dream preset save/load)
+presets/
+
+# Python cache
+**/__pycache__/
+*.pyc
# OpenClaw workspace (runtime state)
config/openclaw/workspace/
diff --git a/dream-server/.shellcheckrc b/dream-server/.shellcheckrc
new file mode 100644
index 000000000..67f46f812
--- /dev/null
+++ b/dream-server/.shellcheckrc
@@ -0,0 +1,10 @@
+# ShellCheck configuration for Dream Server
+# https://www.shellcheck.net/wiki/
+
+# Allow sourcing files that can't be resolved statically
+# (libs are sourced by install-core.sh at runtime)
+disable=SC1090
+disable=SC1091
+
+# Allow using $'...' in older bash (we target bash 4+)
+disable=SC3003
diff --git a/dream-server/CHANGELOG.md b/dream-server/CHANGELOG.md
new file mode 100644
index 000000000..b591ee550
--- /dev/null
+++ b/dream-server/CHANGELOG.md
@@ -0,0 +1,46 @@
+# Changelog
+
+All notable changes to Dream Server will be documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
+
+## [Unreleased]
+
+## [2.0.0] - 2026-03-03
+
+### Added
+- Documentation index (`docs/README.md`) for navigating 30+ doc files
+- `.env.example` with all required and optional variables documented
+- `docker-compose.override.yml` auto-include for custom service extensions
+- Real shell function tests for `resolve_tier_config()` (replaces tautological Python tests)
+- Dry-run reporting for phases 06, 07, 09, 10, 12
+- `Makefile` with `lint`, `test`, `smoke`, `gate` targets
+- ShellCheck integration in CI
+- `CHANGELOG.md`, `CODE_OF_CONDUCT.md`, issue/PR templates
+
+### Changed
+- Modular installer: 2591-line monolith split into 6 libraries + 13 phases
+- All services now core in `docker-compose.base.yml` (profiles removed)
+- Models switched from AWQ to GGUF Q4_K_M quantization
+
+### Fixed
+- Tier error message now auto-updates when new tiers are added
+- Phase 12 (health) no longer crashes in dry-run mode
+- n8n timezone default changed from `America/New_York` to `UTC`
+- Stale variable names in INTEGRATION-GUIDE.md
+- Embeddings port in INTEGRATION-GUIDE.md (9103 โ 8090)
+- Purged all stale `--profile` references across codebase (12+ files)
+- Purged all stale `docker-compose.yml` references in docs
+- AWQ references in QUICKSTART.md updated to GGUF Q4_K_M
+- `make lint` no longer silently swallows errors
+- Makefile now uses `find` to discover all .sh files instead of hardcoded globs
+
+### Removed
+- Token Spy (service, docs, installer refs, systemd units, dashboard-api integration)
+- `docker-compose.strix-halo.yml` (deprecated, merged into base + amd overlay)
+- Tautological Python test suite (`test_installer.py`)
+- `asyncpg` dependency from dashboard-api (was only used by Token Spy)
+
+## [0.3.0-dev] - 2025-05-01
+
+Initial development release with modular installer architecture.
diff --git a/dream-server/CODE_OF_CONDUCT.md b/dream-server/CODE_OF_CONDUCT.md
new file mode 100644
index 000000000..0f4c07035
--- /dev/null
+++ b/dream-server/CODE_OF_CONDUCT.md
@@ -0,0 +1,40 @@
+# Contributor Covenant Code of Conduct
+
+## Our Pledge
+
+We as members, contributors, and leaders pledge to make participation in our
+community a harassment-free experience for everyone, regardless of age, body
+size, visible or invisible disability, ethnicity, sex characteristics, gender
+identity and expression, level of experience, education, socio-economic status,
+nationality, personal appearance, race, caste, color, religion, or sexual
+identity and orientation.
+
+## Our Standards
+
+Examples of behavior that contributes to a positive environment:
+
+* Using welcoming and inclusive language
+* Being respectful of differing viewpoints and experiences
+* Gracefully accepting constructive criticism
+* Focusing on what is best for the community
+* Showing empathy towards other community members
+
+Examples of unacceptable behavior:
+
+* The use of sexualized language or imagery, and sexual attention or advances of any kind
+* Trolling, insulting or derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information without explicit permission
+* Other conduct which could reasonably be considered inappropriate in a professional setting
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported to the project team at **conduct@lightheartlabs.com**.
+
+All complaints will be reviewed and investigated promptly and fairly. The project
+team is obligated to maintain confidentiality with regard to the reporter.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant](https://www.contributor-covenant.org/), version 2.1.
diff --git a/dream-server/CONTRIBUTING.md b/dream-server/CONTRIBUTING.md
index 235a58d42..ab55f969c 100644
--- a/dream-server/CONTRIBUTING.md
+++ b/dream-server/CONTRIBUTING.md
@@ -1,63 +1,84 @@
# Contributing to Dream Server
-Thanks for wanting to help! Here's how to get involved.
+Thanks for building with us.
-## Reporting Issues
+## Fast Path
-Found a bug? Please open an issue with:
-- Your hardware (GPU, RAM, OS)
-- What you expected to happen
-- What actually happened
-- Logs if relevant (`docker compose logs`)
+If you want to add or extend services, start here:
+- [docs/EXTENSIONS.md](docs/EXTENSIONS.md) โ extending services (Docker containers, dashboards)
+- [docs/INSTALLER-ARCHITECTURE.md](docs/INSTALLER-ARCHITECTURE.md) โ modding the installer itself
-## Pull Requests
+That guide includes a practical "add a service in 30 minutes" path with templates and checks.
-1. Fork the repo
-2. Create a feature branch (`git checkout -b feature/cool-thing`)
-3. Make your changes
-4. Test on your hardware
-5. Submit PR with clear description
+## Reporting Issues
-## What We're Looking For
+Open an issue with:
+- hardware details (GPU, RAM, OS)
+- expected behavior
+- actual behavior
+- relevant logs (`docker compose logs`)
-**High Value:**
-- New workflow templates (n8n JSON exports)
-- Hardware-specific optimizations
-- Better error messages
-- Documentation improvements
+## Pull Requests
-**Good First Issues:**
-- Fix typos in docs
-- Add more troubleshooting cases
-- Improve comments in install.sh
+1. Fork and create a branch (`git checkout -b feature/my-change`)
+2. Keep PR scope focused (one milestone-sized change)
+3. Run validation locally
+4. Submit PR with clear description, impact, and test evidence
-**Harder But Appreciated:**
-- Multi-GPU support improvements
-- New model presets
-- Alternative TTS/STT engines
+## Contributor Validation Checklist
-## Testing Your Changes
+The fastest way to validate everything:
+```bash
+make gate # lint + test + smoke + simulate
+```
+Or run individual steps:
```bash
-# Fresh install test
-rm -rf ~/dream-server
-./install.sh --dry-run # Check what would happen
-./install.sh # Actually install
+make lint # Shell syntax + Python compile checks
+make test # Tier map unit tests + installer contracts
+make smoke # Platform smoke tests
+```
-# Run the status check
-./status.sh
+Full manual checklist:
+```bash
+# Shell/API checks
+bash -n install.sh install-core.sh installers/lib/*.sh installers/phases/*.sh scripts/*.sh tests/*.sh 2>/dev/null || true
+python3 -m py_compile dashboard-api/main.py dashboard-api/agent_monitor.py
+
+# Unit tests
+bash tests/test-tier-map.sh
+
+# Integration/smoke checks
+bash tests/integration-test.sh
+bash tests/smoke/linux-amd.sh
+bash tests/smoke/linux-nvidia.sh
+bash tests/smoke/wsl-logic.sh
+bash tests/smoke/macos-dispatch.sh
+```
+
+If your change touches dashboard frontend and Node is available:
+```bash
+cd dashboard
+npm install
+npm run lint
+npm run build
```
-## Code Style
+## High-Value Contributions
-- Bash: Use ShellCheck. We're not religious about style, just be consistent.
-- YAML: 2-space indent, no tabs.
-- Markdown: Keep it readable. No 80-char wrapping.
+- extension manifests and service integrations
+- dashboard plugin/registry improvements
+- installer mods: new tiers, themes, phases (see [docs/INSTALLER-ARCHITECTURE.md](docs/INSTALLER-ARCHITECTURE.md))
+- installer portability and platform support
+- workflow catalog quality and docs
+- CI coverage and deterministic tests
-## Questions?
+## Style
-Open an issue or find us in Discord.
+- Bash: predictable, defensive, and syntax-clean
+- YAML/JSON: stable keys, minimal noise, no tabs
+- Docs: concrete commands and compatibility notes
----
+## Questions
-*Your contributions help bring local AI to everyone.*
+Open an issue and include enough context to reproduce the problem quickly.
diff --git a/dream-server/EDGE-QUICKSTART.md b/dream-server/EDGE-QUICKSTART.md
index 8f27566dc..dd5b09a39 100644
--- a/dream-server/EDGE-QUICKSTART.md
+++ b/dream-server/EDGE-QUICKSTART.md
@@ -1,5 +1,14 @@
# Dream Server โ Edge Quickstart
+> **Status: Planned โ Not Yet Available.**
+>
+> This guide describes a future edge deployment mode. The referenced `docker-compose.edge.yml` does not exist yet. **Do not follow these instructions** โ they will not work.
+>
+> For CPU-only machines without a GPU, use `--cloud` mode instead:
+> ```bash
+> ./install-core.sh --cloud
+> ```
+
*For Raspberry Pi 5, Mac Mini, or any 8GB+ system without a dedicated GPU.*
---
@@ -26,8 +35,8 @@
```bash
# 1. Clone and enter
-git clone https://github.com/Light-Heart-Labs/Lighthouse-AI.git
-cd Lighthouse-AI/dream-server
+git clone https://github.com/Light-Heart-Labs/DreamServer.git
+cd DreamServer
# 2. Start core services
docker compose -f docker-compose.edge.yml up -d
@@ -174,9 +183,9 @@ docker compose -f docker-compose.edge.yml up -d
## Next Steps
-- Configure voice assistant: See `docs/VOICE-SETUP.md`
- Add OpenClaw agent: See `docs/OPENCLAW-INTEGRATION.md`
- Create automations: Use n8n at http://localhost:5678
+- Full documentation index: See `docs/README.md`
---
diff --git a/dream-server/FAQ.md b/dream-server/FAQ.md
index f309fd2cc..fa66c1fbd 100644
--- a/dream-server/FAQ.md
+++ b/dream-server/FAQ.md
@@ -10,7 +10,7 @@ Frequently asked questions about installing, running, and troubleshooting Dream
### What is Dream Server?
Dream Server is a turnkey local AI stack that runs entirely on your own hardware. It includes:
-- LLM inference via vLLM (Qwen2.5-32B-Instruct-AWQ)
+- LLM inference via llama-server (qwen2.5-32b-instruct)
- Web dashboard for chat and model management
- Voice capabilities (STT via Whisper, TTS via Kokoro)
- Workflow automation via n8n
@@ -115,9 +115,9 @@ sudo systemctl restart docker
### "CUDA out of memory" errors
Your GPU doesn't have enough VRAM. Options:
-1. Use a smaller model (Qwen2.5-7B instead of 32B)
-2. Enable quantization (AWQ format uses ~60% less VRAM)
-3. Reduce `max_model_len` in docker-compose.yml
+1. Use a smaller model (qwen2.5-7b-instruct instead of 32b)
+2. All models use GGUF Q4_K_M quantization by default
+3. Reduce `CTX_SIZE` in `.env` (try 4096)
4. Run on CPU only (slower but works)
### Windows: WSL2 installation fails
@@ -138,7 +138,7 @@ docker compose ps
**Check logs:**
```bash
docker compose logs dashboard-api
-docker compose logs vllm
+docker compose logs llama-server
```
**Common fixes:**
@@ -250,7 +250,7 @@ docker compose logs -f
**Specific service:**
```bash
-docker compose logs -f vllm
+docker compose logs -f llama-server
docker compose logs -f dashboard-api
docker compose logs -f voice-agent
```
@@ -268,7 +268,7 @@ docker compose up -d
Or restart specific services:
```bash
-docker compose restart vllm
+docker compose restart llama-server
```
### "Connection refused" to API
@@ -287,7 +287,7 @@ Models need ~20GB per model. Free up space if needed.
**Check model download:**
```bash
-ls -la models/
+ls -la data/models/
```
If empty or incomplete, re-download:
@@ -356,9 +356,9 @@ docker compose down -v
## Advanced
### How do I add a custom model?
-1. Download model to `models/` directory
-2. Edit `docker-compose.yml` โ change `LLM_MODEL` environment variable
-3. Restart: `docker compose up -d vllm`
+1. Download model to `data/models/` directory
+2. Edit `.env` โ change `LLM_MODEL` and `GGUF_FILE` variables
+3. Restart: `docker compose up -d llama-server`
Supported formats: AWQ, GPTQ, EXL2, GGUF (via llama.cpp adapter)
@@ -373,11 +373,8 @@ caddy reverse-proxy --from your-domain.com --to localhost:3000
For local development, browsers accept self-signed certs at `https://localhost`.
### Can I run on multiple GPUs?
-Yes! Edit `docker-compose.yml`:
+Yes! Edit `docker-compose.nvidia.yml` to expose multiple GPUs:
```yaml
-environment:
- - TENSOR_PARALLEL_SIZE=2 # Use 2 GPUs
- - GPU_MEMORY_UTILIZATION=0.95
deploy:
resources:
reservations:
@@ -388,9 +385,9 @@ deploy:
```
### How do I backup my data?
-**Configs and workflows:**
+**Configs and data:**
```bash
-tar -czf dream-server-backup.tar.gz .env workflows/ n8n-data/
+tar -czf dream-server-backup.tar.gz .env data/
```
**Models (large):**
@@ -445,7 +442,7 @@ curl http://localhost:3001/api/metrics
| 3000 | Open WebUI (chat interface) |
| 3001 | Dashboard |
| 3002 | Dashboard API |
-| 8000 | vLLM API |
+| 8080 | llama-server API |
| 8085 | Privacy Shield |
| 5678 | n8n workflow editor |
| 7880 | LiveKit voice server |
@@ -468,11 +465,11 @@ Then restart: `docker compose up -d`
### Documentation
- Main README: `dream-server/README.md`
-- Architecture: `docs/ARCHITECTURE.md`
+- Installer Architecture: `docs/INSTALLER-ARCHITECTURE.md`
- Security: `SECURITY.md`
### Community
-- GitHub Issues: https://github.com/Light-Heart-Labs/Lighthouse-AI/issues
+- GitHub Issues: https://github.com/Light-Heart-Labs/DreamServer/issues
- Discord: #general channel
### Debug info for bug reports
diff --git a/dream-server/LICENSE b/dream-server/LICENSE
new file mode 100644
index 000000000..261eeb9e9
--- /dev/null
+++ b/dream-server/LICENSE
@@ -0,0 +1,201 @@
+ Apache License
+ Version 2.0, January 2004
+ http://www.apache.org/licenses/
+
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+ 1. Definitions.
+
+ "License" shall mean the terms and conditions for use, reproduction,
+ and distribution as defined by Sections 1 through 9 of this document.
+
+ "Licensor" shall mean the copyright owner or entity authorized by
+ the copyright owner that is granting the License.
+
+ "Legal Entity" shall mean the union of the acting entity and all
+ other entities that control, are controlled by, or are under common
+ control with that entity. For the purposes of this definition,
+ "control" means (i) the power, direct or indirect, to cause the
+ direction or management of such entity, whether by contract or
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
+ outstanding shares, or (iii) beneficial ownership of such entity.
+
+ "You" (or "Your") shall mean an individual or Legal Entity
+ exercising permissions granted by this License.
+
+ "Source" form shall mean the preferred form for making modifications,
+ including but not limited to software source code, documentation
+ source, and configuration files.
+
+ "Object" form shall mean any form resulting from mechanical
+ transformation or translation of a Source form, including but
+ not limited to compiled object code, generated documentation,
+ and conversions to other media types.
+
+ "Work" shall mean the work of authorship, whether in Source or
+ Object form, made available under the License, as indicated by a
+ copyright notice that is included in or attached to the work
+ (an example is provided in the Appendix below).
+
+ "Derivative Works" shall mean any work, whether in Source or Object
+ form, that is based on (or derived from) the Work and for which the
+ editorial revisions, annotations, elaborations, or other modifications
+ represent, as a whole, an original work of authorship. For the purposes
+ of this License, Derivative Works shall not include works that remain
+ separable from, or merely link (or bind by name) to the interfaces of,
+ the Work and Derivative Works thereof.
+
+ "Contribution" shall mean any work of authorship, including
+ the original version of the Work and any modifications or additions
+ to that Work or Derivative Works thereof, that is intentionally
+ submitted to Licensor for inclusion in the Work by the copyright owner
+ or by an individual or Legal Entity authorized to submit on behalf of
+ the copyright owner. For the purposes of this definition, "submitted"
+ means any form of electronic, verbal, or written communication sent
+ to the Licensor or its representatives, including but not limited to
+ communication on electronic mailing lists, source code control systems,
+ and issue tracking systems that are managed by, or on behalf of, the
+ Licensor for the purpose of discussing and improving the Work, but
+ excluding communication that is conspicuously marked or otherwise
+ designated in writing by the copyright owner as "Not a Contribution."
+
+ "Contributor" shall mean Licensor and any individual or Legal Entity
+ on behalf of whom a Contribution has been received by Licensor and
+ subsequently incorporated within the Work.
+
+ 2. Grant of Copyright License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ copyright license to reproduce, prepare Derivative Works of,
+ publicly display, publicly perform, sublicense, and distribute the
+ Work and such Derivative Works in Source or Object form.
+
+ 3. Grant of Patent License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ (except as stated in this section) patent license to make, have made,
+ use, offer to sell, sell, import, and otherwise transfer the Work,
+ where such license applies only to those patent claims licensable
+ by such Contributor that are necessarily infringed by their
+ Contribution(s) alone or by combination of their Contribution(s)
+ with the Work to which such Contribution(s) was submitted. If You
+ institute patent litigation against any entity (including a
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
+ or a Contribution incorporated within the Work constitutes direct
+ or contributory patent infringement, then any patent licenses
+ granted to You under this License for that Work shall terminate
+ as of the date such litigation is filed.
+
+ 4. Redistribution. You may reproduce and distribute copies of the
+ Work or Derivative Works thereof in any medium, with or without
+ modifications, and in Source or Object form, provided that You
+ meet the following conditions:
+
+ (a) You must give any other recipients of the Work or
+ Derivative Works a copy of this License; and
+
+ (b) You must cause any modified files to carry prominent notices
+ stating that You changed the files; and
+
+ (c) You must retain, in the Source form of any Derivative Works
+ that You distribute, all copyright, patent, trademark, and
+ attribution notices from the Source form of the Work,
+ excluding those notices that do not pertain to any part of
+ the Derivative Works; and
+
+ (d) If the Work includes a "NOTICE" text file as part of its
+ distribution, then any Derivative Works that You distribute must
+ include a readable copy of the attribution notices contained
+ within such NOTICE file, excluding those notices that do not
+ pertain to any part of the Derivative Works, in at least one
+ of the following places: within a NOTICE text file distributed
+ as part of the Derivative Works; within the Source form or
+ documentation, if provided along with the Derivative Works; or,
+ within a display generated by the Derivative Works, if and
+ wherever such third-party notices normally appear. The contents
+ of the NOTICE file are for informational purposes only and
+ do not modify the License. You may add Your own attribution
+ notices within Derivative Works that You distribute, alongside
+ or as an addendum to the NOTICE text from the Work, provided
+ that such additional attribution notices cannot be construed
+ as modifying the License.
+
+ You may add Your own copyright statement to Your modifications and
+ may provide additional or different license terms and conditions
+ for use, reproduction, or distribution of Your modifications, or
+ for any such Derivative Works as a whole, provided Your use,
+ reproduction, and distribution of the Work otherwise complies with
+ the conditions stated in this License.
+
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
+ any Contribution intentionally submitted for inclusion in the Work
+ by You to the Licensor shall be under the terms and conditions of
+ this License, without any additional terms or conditions.
+ Notwithstanding the above, nothing herein shall supersede or modify
+ the terms of any separate license agreement you may have executed
+ with Licensor regarding such Contributions.
+
+ 6. Trademarks. This License does not grant permission to use the trade
+ names, trademarks, service marks, or product names of the Licensor,
+ except as required for reasonable and customary use in describing the
+ origin of the Work and reproducing the content of the NOTICE file.
+
+ 7. Disclaimer of Warranty. Unless required by applicable law or
+ agreed to in writing, Licensor provides the Work (and each
+ Contributor provides its Contributions) on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+ implied, including, without limitation, any warranties or conditions
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+ PARTICULAR PURPOSE. You are solely responsible for determining the
+ appropriateness of using or redistributing the Work and assume any
+ risks associated with Your exercise of permissions under this License.
+
+ 8. Limitation of Liability. In no event and under no legal theory,
+ whether in tort (including negligence), contract, or otherwise,
+ unless required by applicable law (such as deliberate and grossly
+ negligent acts) or agreed to in writing, shall any Contributor be
+ liable to You for damages, including any direct, indirect, special,
+ incidental, or consequential damages of any character arising as a
+ result of this License or out of the use or inability to use the
+ Work (including but not limited to damages for loss of goodwill,
+ work stoppage, computer failure or malfunction, or any and all
+ other commercial damages or losses), even if such Contributor
+ has been advised of the possibility of such damages.
+
+ 9. Accepting Warranty or Additional Liability. While redistributing
+ the Work or Derivative Works thereof, You may choose to offer,
+ and charge a fee for, acceptance of support, warranty, indemnity,
+ or other liability obligations and/or rights consistent with this
+ License. However, in accepting such obligations, You may act only
+ on Your own behalf and on Your sole responsibility, not on behalf
+ of any other Contributor, and only if You agree to indemnify,
+ defend, and hold each Contributor harmless for any liability
+ incurred by, or claims asserted against, such Contributor by reason
+ of your accepting any such warranty or additional liability.
+
+ END OF TERMS AND CONDITIONS
+
+ APPENDIX: How to apply the Apache License to your work.
+
+ To apply the Apache License to your work, attach the following
+ boilerplate notice, with the fields enclosed by brackets "[]"
+ replaced with your own identifying information. (Don't include
+ the brackets!) The text should be enclosed in the appropriate
+ comment syntax for the file format. We also recommend that a
+ file or class name and description of purpose be included on the
+ same "printed page" as the copyright notice for easier
+ identification within third-party archives.
+
+ Copyright [yyyy] [name of copyright owner]
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
diff --git a/dream-server/Makefile b/dream-server/Makefile
new file mode 100644
index 000000000..3400e193c
--- /dev/null
+++ b/dream-server/Makefile
@@ -0,0 +1,43 @@
+# Dream Server โ Developer Targets
+# Run `make help` to see available commands.
+
+SHELL_FILES := $(shell find . -name '*.sh' -not -path './node_modules/*' -not -path './.git/*' -not -path './data/*' -not -path './token-spy/*')
+
+.PHONY: help lint test smoke simulate gate doctor
+
+help: ## Show this help
+ @grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | sort | \
+ awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-15s\033[0m %s\n", $$1, $$2}'
+
+lint: ## Syntax check all shell scripts + Python compile check
+ @echo "=== Shell syntax ==="
+ @fail=0; for f in $(SHELL_FILES); do bash -n "$$f" || fail=1; done; [ $$fail -eq 0 ]
+ @echo "=== Python compile ==="
+ @python3 -m py_compile dashboard-api/main.py dashboard-api/agent_monitor.py
+ @echo "All lint checks passed."
+
+test: ## Run unit and contract tests
+ @echo "=== Tier map tests ==="
+ @bash tests/test-tier-map.sh
+ @echo ""
+ @echo "=== Installer contracts ==="
+ @bash tests/contracts/test-installer-contracts.sh
+ @bash tests/contracts/test-preflight-fixtures.sh
+
+smoke: ## Run platform smoke tests
+ @echo "=== Smoke tests ==="
+ @bash tests/smoke/linux-amd.sh
+ @bash tests/smoke/linux-nvidia.sh
+ @bash tests/smoke/wsl-logic.sh
+ @bash tests/smoke/macos-dispatch.sh
+ @echo "All smoke tests passed."
+
+simulate: ## Run installer simulation harness
+ @bash scripts/simulate-installers.sh
+
+doctor: ## Run diagnostic report
+ @bash scripts/dream-doctor.sh
+
+gate: lint test smoke simulate ## Full pre-release validation (lint + test + smoke + simulate)
+ @echo ""
+ @echo "Release gate passed."
diff --git a/dream-server/PSScriptAnalyzerSettings.psd1 b/dream-server/PSScriptAnalyzerSettings.psd1
new file mode 100644
index 000000000..85d6107e5
--- /dev/null
+++ b/dream-server/PSScriptAnalyzerSettings.psd1
@@ -0,0 +1,16 @@
+@{
+ Rules = @{
+ PSAvoidUsingWriteHost = @{
+ Enable = $false
+ }
+ PSAvoidUsingConvertToSecureStringWithPlainText = @{
+ Enable = $true
+ }
+ PSUseApprovedVerbs = @{
+ Enable = $true
+ }
+ PSUseDeclaredVarsMoreThanAssignments = @{
+ Enable = $true
+ }
+ }
+}
diff --git a/dream-server/QUICKSTART.md b/dream-server/QUICKSTART.md
index a8a10291e..7bf5eb448 100644
--- a/dream-server/QUICKSTART.md
+++ b/dream-server/QUICKSTART.md
@@ -2,21 +2,30 @@
One command to a fully running local AI stack. No manual config, no dependency hell.
+See [`docs/SUPPORT-MATRIX.md`](docs/SUPPORT-MATRIX.md) before installing to confirm current platform support.
+
## Prerequisites
-**Linux:**
+**Linux (NVIDIA GPU):**
- Docker with Compose v2+ ([Install](https://docs.docker.com/get-docker/))
- NVIDIA GPU with 8GB+ VRAM (16GB+ recommended)
- NVIDIA Container Toolkit ([Install](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html))
- 40GB+ disk space (for models)
+**Linux (AMD Strix Halo):**
+- Docker with Compose v2+ ([Install](https://docs.docker.com/get-docker/))
+- AMD Ryzen AI MAX+ APU with 64GB+ unified memory
+- ROCm-compatible kernel (6.17+ recommended, 6.18.4+ ideal)
+- `/dev/kfd` and `/dev/dri` accessible (user in `video` + `render` groups)
+- 60GB+ disk space (for GGUF model files)
+
**Windows:**
- Windows 10 21H2+ or Windows 11
- NVIDIA GPU with drivers
- Docker Desktop (installer will prompt if missing)
- WSL2 (installer will enable if needed)
-For Windows, use `install.ps1` instead โ see [README.md](README.md#windows).
+For Windows and macOS status, see [README.md](README.md#platform-support) and [`docs/SUPPORT-MATRIX.md`](docs/SUPPORT-MATRIX.md).
## Step 1: Run the Installer
@@ -26,14 +35,19 @@ For Windows, use `install.ps1` instead โ see [README.md](README.md#windows).
The installer will:
1. **Detect your GPU** and auto-select the right tier:
- - Tier 1 (Entry): <12GB VRAM โ Qwen2.5-7B, 8K context
- - Tier 2 (Prosumer): 12-20GB VRAM โ Qwen2.5-14B-AWQ, 16K context
- - Tier 3 (Pro): 20-40GB VRAM โ Qwen2.5-32B-AWQ, 32K context
- - Tier 4 (Enterprise): 40GB+ VRAM โ Qwen2.5-72B-AWQ, 32K context
-2. Check Docker and NVIDIA toolkit
+ - **AMD Strix Halo (unified memory)**:
+ - SH_LARGE (90GB+): qwen3-coder-next (80B MoE), 128K context
+ - SH_COMPACT (64-89GB): qwen3-30b-a3b (30B MoE), 128K context
+ - **NVIDIA (discrete GPU)**:
+ - Tier 1 (Entry): <12GB VRAM โ qwen2.5-7b-instruct (GGUF Q4_K_M), 16K context
+ - Tier 2 (Prosumer): 12-20GB VRAM โ qwen2.5-14b-instruct (GGUF Q4_K_M), 16K context
+ - Tier 3 (Pro): 20-40GB VRAM โ qwen2.5-32b-instruct (GGUF Q4_K_M), 32K context
+ - Tier 4 (Enterprise): 40GB+ VRAM โ qwen2.5-72b-instruct (GGUF Q4_K_M), 32K context
+2. Check Docker and GPU toolkit (NVIDIA Container Toolkit or ROCm devices)
3. Ask which optional components to enable (voice, workflows, RAG)
4. Generate secure passwords and configuration
-5. Start all services
+5. Apply system tuning (AMD: sysctl, amdgpu modprobe, etc.)
+6. Start all services
**Override tier manually:** `./install.sh --tier 3`
@@ -41,13 +55,24 @@ The installer will:
## Step 2: Wait for Model Download
-First run downloads the LLM (~20GB for 32B AWQ). Watch progress:
+**NVIDIA:** First run downloads the LLM (~20GB for 32B GGUF). Watch progress:
+
+```bash
+docker compose logs -f llama-server
+```
+
+When you see `server is listening on`, you're ready!
+
+**AMD Strix Halo:** The GGUF model downloads in the background (~25-52GB). Watch progress:
```bash
-docker compose logs -f vllm
+tail -f ~/dream-server/logs/model-download.log
+
+# Or check llama-server readiness:
+docker compose -f docker-compose.base.yml -f docker-compose.amd.yml logs -f llama-server
```
-When you see `Application startup complete`, you're ready!
+When you see `server is listening on`, the model is loaded and ready.
## Step 3: Validate Installation
@@ -76,11 +101,22 @@ Visit: **http://localhost:3000**
## Step 5: Test the API
+**NVIDIA:**
+```bash
+curl http://localhost:8080/v1/chat/completions \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "qwen2.5-32b-instruct",
+ "messages": [{"role": "user", "content": "Hello!"}]
+ }'
+```
+
+**AMD Strix Halo:**
```bash
-curl http://localhost:8000/v1/chat/completions \
+curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
- "model": "Qwen/Qwen2.5-32B-Instruct-AWQ",
+ "model": "qwen3-coder-next",
"messages": [{"role": "user", "content": "Hello!"}]
}'
```
@@ -91,12 +127,21 @@ curl http://localhost:8000/v1/chat/completions \
The installer auto-detects your GPU and selects the optimal configuration:
+**AMD Strix Halo:**
+
+| Tier | Unified VRAM | Model | Hardware |
+|------|-------------|-------|----------|
+| SH_LARGE | 90GB+ | qwen3-coder-next (80B MoE) | Ryzen AI MAX+ (96GB config) |
+| SH_COMPACT | 64-89GB | qwen3:30b-a3b (30B MoE) | Ryzen AI MAX+ (64GB config) |
+
+**NVIDIA:**
+
| Tier | VRAM | Model | Example GPUs |
|------|------|-------|--------------|
| 1 (Entry) | <12GB | Qwen2.5-7B | RTX 3080, RTX 4070 |
-| 2 (Prosumer) | 12-20GB | Qwen2.5-14B-AWQ | RTX 3090, RTX 4080 |
-| 3 (Pro) | 20-40GB | Qwen2.5-32B-AWQ | RTX 4090, A6000 |
-| 4 (Enterprise) | 40GB+ | Qwen2.5-72B-AWQ | A100, H100 |
+| 2 (Prosumer) | 12-20GB | Qwen2.5-14B (GGUF Q4_K_M) | RTX 3090, RTX 4080 |
+| 3 (Pro) | 20-40GB | Qwen2.5-32B (GGUF Q4_K_M) | RTX 4090, A6000 |
+| 4 (Enterprise) | 40GB+ | Qwen2.5-72B (GGUF Q4_K_M) | A100, H100 |
To check what tier you'd get without installing:
@@ -108,61 +153,79 @@ To check what tier you'd get without installing:
## Common Issues
-### "OOM" or "CUDA out of memory"
+### "OOM" or "CUDA out of memory" (NVIDIA)
Reduce context window in `.env`:
```
-MAX_CONTEXT=4096 # or even 2048
+CTX_SIZE=4096 # or even 2048
```
Or switch to a smaller model:
```
-LLM_MODEL=Qwen/Qwen2.5-7B-Instruct
+LLM_MODEL=qwen2.5-7b-instruct
```
+### AMD: llama-server crash loop
+
+Check logs: `docker compose -f docker-compose.base.yml -f docker-compose.amd.yml logs llama-server`
+
+Common causes:
+- GGUF file not found: ensure `data/models/*.gguf` exists
+- Wrong GGUF format: use upstream llama.cpp GGUFs (NOT Ollama blobs)
+- Missing ROCm env vars: `HSA_OVERRIDE_GFX_VERSION=11.5.1` must be set
+
### Model download fails
1. Check disk space: `df -h`
-2. Try again: `docker compose restart vllm`
-3. Or pre-download with Hugging Face CLI
+2. **NVIDIA:** Try again: `docker compose restart llama-server`
+3. **AMD:** Resume download: `wget -c -O data/models/.gguf `
### WebUI shows "No models available"
-vLLM is still loading. Check: `docker compose logs vllm`
+The inference engine is still loading.
+- **NVIDIA:** Check: `docker compose logs llama-server`
+- **AMD:** Check: `docker compose -f docker-compose.base.yml -f docker-compose.amd.yml logs llama-server`
### Port conflicts
Edit `.env` to change ports:
```
WEBUI_PORT=3001
-VLLM_PORT=8001
+LLAMA_SERVER_PORT=8081 # LLM inference port
```
---
## Next Steps
-- **Enable voice**: `docker compose --profile voice up -d`
-- **Try voice-to-voice**: Import `workflows/05-voice-to-voice.json` into n8n โ speak, get spoken answers back
-- **Add workflows**: `docker compose --profile workflows up -d` (see `workflows/README.md`)
-- **Set up RAG**: `docker compose --profile rag up -d`
-- **Connect OpenClaw**: Use this as your local inference backend
+- **Add workflows**: Open n8n at http://localhost:5678 to create custom automation workflows
+- **Connect OpenClaw**: Use this as your local inference backend at http://localhost:7860
+- **Dashboard**: Monitor services, GPU, and health at http://localhost:3001
---
## Stopping
```bash
+# NVIDIA
docker compose down
+
+# AMD Strix Halo
+docker compose -f docker-compose.base.yml -f docker-compose.amd.yml down
```
## Updating
```bash
+# NVIDIA
docker compose pull
docker compose up -d
+
+# AMD Strix Halo
+docker compose -f docker-compose.base.yml -f docker-compose.amd.yml pull
+docker compose -f docker-compose.base.yml -f docker-compose.amd.yml up -d --build
```
---
-Built by The Collective โข [Lighthouse AI](https://github.com/Light-Heart-Labs/Lighthouse-AI)
+Built by The Collective โข [DreamServer](https://github.com/Light-Heart-Labs/DreamServer)
diff --git a/dream-server/README.md b/dream-server/README.md
index 82392edd5..4cd33363e 100644
--- a/dream-server/README.md
+++ b/dream-server/README.md
@@ -3,24 +3,42 @@
[](../LICENSE)
[](https://docs.docker.com/get-docker/)
[](https://developer.nvidia.com/cuda-toolkit)
+[](https://rocm.docs.amd.com/)
[](https://n8n.io)
**Your turnkey local AI stack.** Buy hardware. Run installer. AI running.
---
+## Platform Support
+
+See [`docs/SUPPORT-MATRIX.md`](docs/SUPPORT-MATRIX.md) for current support tiers and platform status.
+Launch-claim guardrails: [`docs/PLATFORM-TRUTH-TABLE.md`](docs/PLATFORM-TRUTH-TABLE.md)
+Known-good version baselines: [`docs/KNOWN-GOOD-VERSIONS.md`](docs/KNOWN-GOOD-VERSIONS.md)
+
+## Installer Evidence
+
+- Run simulation suite: `bash scripts/simulate-installers.sh`
+- Output artifacts:
+ - `artifacts/installer-sim/summary.json`
+ - `artifacts/installer-sim/SUMMARY.md`
+- CI uploads these artifacts on each PR via `.github/workflows/test-linux.yml`
+- One-command maintainer gate: `bash scripts/release-gate.sh`
+
+---
+
## 5-Minute Quickstart
```bash
# One-line install (Linux/WSL)
-curl -fsSL https://raw.githubusercontent.com/Light-Heart-Labs/Lighthouse-AI/main/dream-server/get-dream-server.sh | bash
+curl -fsSL https://raw.githubusercontent.com/Light-Heart-Labs/DreamServer/main/get-dream-server.sh | bash
```
Or manually:
```bash
-git clone https://github.com/Light-Heart-Labs/Lighthouse-AI.git
-cd Lighthouse-AI/dream-server
+git clone https://github.com/Light-Heart-Labs/DreamServer.git
+cd DreamServer
./install.sh
```
@@ -42,41 +60,58 @@ To skip bootstrap and wait for the full model: `./install.sh --no-bootstrap`
### Windows
```powershell
-Invoke-WebRequest -Uri "https://raw.githubusercontent.com/Light-Heart-Labs/Lighthouse-AI/main/dream-server/install.ps1" -OutFile install.ps1
-.\install.ps1
+.\installers\windows.ps1
```
-The Windows installer handles WSL2 setup, Docker Desktop, and NVIDIA drivers automatically.
-
-**Requirements:** Windows 10 21H2+ or Windows 11, NVIDIA GPU, Docker Desktop
+Windows installer performs prerequisite checks, emits a preflight report, and delegates to WSL2 install path. See [`docs/SUPPORT-MATRIX.md`](docs/SUPPORT-MATRIX.md) for exact support level.
---
## What's Included
-| Component | Purpose | Port |
-|-----------|---------|------|
-| **vLLM** | High-performance LLM inference | 8000 |
-| **Open WebUI** | Beautiful chat interface | 3000 |
-| **Dashboard** | System status, GPU metrics, service health | 3001 |
-| **Privacy Shield** | PII redaction for external API calls | 8085 |
-| **Whisper** | Speech-to-text (optional) | 9000 |
-| **Kokoro** | Text-to-speech (optional) | 8880 |
-| **LiveKit** | Real-time WebRTC voice chat (optional) | 7880 |
-| **n8n** | Workflow automation (optional) | 5678 |
-| **Qdrant** | Vector database for RAG (optional) | 6333 |
-| **LiteLLM** | Multi-model API gateway (optional) | 4000 |
+| Component | Purpose | Port | Backend |
+|-----------|---------|------|---------|
+| **llama-server** | LLM inference engine | 8080 | Both |
+| **Open WebUI** | Beautiful chat interface | 3000 | Both |
+| **Dashboard** | System status, GPU metrics, service health | 3001 | Both |
+| **Dashboard API** | Backend API for dashboard | 3002 | Both |
+| **LiteLLM** | Multi-model API gateway | 4000 | Both |
+| **OpenClaw** | Autonomous AI agent framework | 7860 | Both |
+| **SearXNG** | Self-hosted web search | 8888 | Both |
+| **Perplexica** | Deep research engine | 3004 | Both |
+| **n8n** | Workflow automation | 5678 | Both |
+| **Qdrant** | Vector database for RAG | 6333 | Both |
+| **Embeddings** | Text embeddings for RAG | 8090 | Both |
+| **Whisper** | Speech-to-text | 9000 | Both |
+| **Kokoro** | Text-to-speech | 8880 | Both |
+| **Privacy Shield** | PII protection for API calls | 8085 | Both |
+| **Memory Shepherd** | Agent memory lifecycle management | โ | AMD |
+| **ComfyUI** | Image generation | 8188 | Both |
## Hardware Tiers
The installer **automatically detects your GPU** and selects the right configuration:
-| Tier | VRAM | Model | Context | Example GPUs |
-|------|------|-------|---------|--------------|
-| 1 (Entry) | <12GB | Qwen2.5-7B | 8K | RTX 3080, RTX 4070 |
-| 2 (Prosumer) | 12-20GB | Qwen2.5-14B-AWQ | 16K | RTX 3090, RTX 4080 |
-| 3 (Pro) | 20-40GB | Qwen2.5-32B-AWQ | 32K | RTX 4090, A6000 |
-| 4 (Enterprise) | 40GB+ | Qwen2.5-72B-AWQ | 32K | A100, H100, multi-GPU |
+### AMD Strix Halo (Unified Memory)
+
+| Tier | Unified VRAM | Model | Context | Example Hardware |
+|------|-------------|-------|---------|-----------------|
+| SH_LARGE | 90GB+ | qwen3-coder-next (80B MoE, 3B active) | 128K | Ryzen AI MAX+ 395 (96GB VRAM config) |
+| SH_COMPACT | 64-89GB | qwen3-30b-a3b (30B MoE, 3B active) | 128K | Ryzen AI MAX+ 395 (64GB VRAM config) |
+
+Both tiers use `qwen2.5:7b` as a bootstrap model for instant startup. The full model downloads in the background via GGUF from HuggingFace.
+
+**Inference backend:** llama-server via ROCm 7.2 (Docker image: `kyuz0/amd-strix-halo-toolboxes:rocm-7.2`)
+
+### NVIDIA (Discrete GPU)
+
+| Tier | VRAM | Model | Quant | Context | Example GPUs |
+|------|------|-------|-------|---------|--------------|
+| NV_ULTRA | 90GB+ | qwen3-coder-next | GGUF Q4_K_M | 128K | Multi-GPU A100/H100 |
+| 1 (Entry) | <12GB | qwen2.5-7b-instruct | GGUF Q4_K_M | 16K | RTX 3080, RTX 4070 |
+| 2 (Prosumer) | 12-20GB | qwen2.5-14b-instruct | GGUF Q4_K_M | 16K | RTX 3090, RTX 4080 |
+| 3 (Pro) | 20-40GB | qwen2.5-32b-instruct | GGUF Q4_K_M | 32K | RTX 4090, A6000 |
+| 4 (Enterprise) | 40GB+ | qwen2.5-72b-instruct | GGUF Q4_K_M | 32K | A100, H100, multi-GPU |
Override with: `./install.sh --tier 3`
@@ -86,6 +121,33 @@ See [docs/HARDWARE-GUIDE.md](docs/HARDWARE-GUIDE.md) for buying recommendations.
## Architecture
+### AMD Strix Halo (llama-server + ROCm)
+
+```
+โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
+โ Open WebUI โ
+โ (localhost:3000) โ
+โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
+ โ
+โโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
+โ llama-server (ROCm 7.2) โ
+โ (localhost:8080/v1/...) โ
+โ qwen3-coder-next / qwen3-30b-a3b โ
+โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
+ โ โ
+โโโโโโโโโโผโโโโโโโโโ โโโโโโโโโผโโโโโโโโโ
+โ OpenClaw โ โ Dashboard โ
+โ (Agent :7860) โ โ (Status :3001) โ
+โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโ
+
+โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
+โ n8n (:5678) โ โQdrant(:6333)โ โLiteLLM(:4000)โ
+โ Workflows โ โ Vector DB โ โ API Gateway โ
+โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
+```
+
+### NVIDIA (llama-server + CUDA)
+
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Open WebUI โ
@@ -93,9 +155,9 @@ See [docs/HARDWARE-GUIDE.md](docs/HARDWARE-GUIDE.md) for buying recommendations.
โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
-โ vLLM โ
-โ (localhost:8000/v1/...) โ
-โ Qwen2.5-32B-Instruct-AWQ โ
+โ llama-server (CUDA) โ
+โ (localhost:8080/v1/...) โ
+โ qwen2.5-32b-instruct โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โโโโโโโโโโผโโโโโโโโโ โโโโโโโโโผโโโโโโโโโ
@@ -104,57 +166,98 @@ See [docs/HARDWARE-GUIDE.md](docs/HARDWARE-GUIDE.md) for buying recommendations.
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
-โ n8n (:5678) โ โQdrant(:6333)โ โLiteLLM(:4K) โ
+โ n8n (:5678) โ โQdrant(:6333)โ โLiteLLM(:4000)โ
โ Workflows โ โ Vector DB โ โ API Gateway โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
```
-## Optional Profiles
+## Modding & Customization
+
+### Extension Services
+
+Each service under `extensions/services/` IS the mod. Drop in a directory, run `dream enable `, and it appears in compose, CLI, dashboard, and health checks.
-Enable components with Docker Compose profiles:
+```
+extensions/services/
+ my-service/
+ manifest.yaml # Service metadata, aliases, category
+ compose.yaml # Docker Compose fragment (auto-merged)
+```
```bash
-# Voice (STT + TTS)
-docker compose --profile voice up -d
+dream enable my-service # Enable an extension
+dream disable my-service # Disable it
+dream list # See all services and status
+```
-# Workflows (n8n)
-docker compose --profile workflows up -d
+Full guide: [docs/EXTENSIONS.md](docs/EXTENSIONS.md)
-# RAG (Qdrant + embeddings)
-docker compose --profile rag up -d
+### Installer Architecture
-# LiveKit Voice Chat (real-time WebRTC voice)
-docker compose --profile livekit --profile voice up -d
+The installer is modular โ 6 libraries and 13 phases, each in its own file.
+Want to add a hardware tier, swap the theme, or skip a phase? Edit one file.
-# Everything
-docker compose --profile voice --profile workflows --profile rag --profile livekit up -d
+```
+installers/lib/ # Pure function libraries (colors, GPU detection, tier mapping)
+installers/phases/ # Sequential install steps (01-preflight through 13-summary)
+install-core.sh # Thin orchestrator (~150 lines)
```
-### LiveKit Voice Chat
+Every file has a standardized header: Purpose, Expects, Provides, Modder notes.
-Real-time voice conversation with your local AI:
+Full guide with copy-paste recipes: [docs/INSTALLER-ARCHITECTURE.md](docs/INSTALLER-ARCHITECTURE.md)
-1. Enable the profile: `docker compose --profile livekit --profile voice up -d`
-2. Open http://localhost:7880 for LiveKit playground
-3. Or integrate with any LiveKit-compatible client
+## Configuration
-**What it does:**
-- WebRTC voice streaming (low latency)
-- Whisper STT โ Local LLM โ Kokoro TTS pipeline
-- Works with browser, mobile apps, or custom clients
+The installer generates `.env` automatically. Key settings:
-See `agents/voice/` for the agent implementation.
+```bash
+# NVIDIA
+LLM_MODEL=qwen2.5-32b-instruct # Model (auto-set by installer)
+CTX_SIZE=32768 # Context window
+
+# AMD Strix Halo
+LLM_MODEL=qwen3-coder-next # or qwen3-30b-a3b for compact tier
+CTX_SIZE=131072 # Context window
+GPU_BACKEND=amd # Set automatically by installer
+```
-## Configuration
+## dream-cli
-Copy `.env.example` to `.env` and customize:
+The `dream` CLI is the primary management tool. It's installed automatically at `~/dream-server/dream-cli` and can be symlinked to your PATH.
```bash
-LLM_MODEL=Qwen/Qwen2.5-32B-Instruct-AWQ # Model (auto-set by installer)
-MAX_CONTEXT=8192 # Context window
-GPU_UTIL=0.9 # VRAM allocation (0.0-1.0)
+# Service management
+dream status # Health checks + GPU status
+dream list # Show all services and their state
+dream logs # Tail logs (accepts aliases: llm, stt, tts)
+dream restart [service] # Restart one or all services
+dream start / stop # Start or stop the stack
+
+# LLM mode switching
+dream mode # Show current mode (local/cloud/hybrid)
+dream mode cloud # Switch to cloud APIs via LiteLLM
+dream mode local # Switch to local llama-server
+dream mode hybrid # Local primary, cloud fallback
+
+# Model management (local mode)
+dream model current # Show active model
+dream model list # List available tiers
+dream model swap T3 # Switch to a different tier
+
+# Extensions
+dream enable n8n # Enable an extension
+dream disable whisper # Disable an extension
+
+# Configuration
+dream config show # View .env (secrets masked)
+dream config edit # Open .env in editor
+dream preset save # Snapshot current config
+dream preset load # Restore a saved preset
```
+Full mode-switching documentation: [docs/MODE-SWITCH.md](docs/MODE-SWITCH.md)
+
## Showcase & Demos
```bash
@@ -171,41 +274,50 @@ GPU_UTIL=0.9 # VRAM allocation (0.0-1.0)
## Useful Commands
```bash
-cd ~/dream-server
-docker compose ps # Check status
-docker compose logs -f vllm # Watch vLLM logs
-docker compose restart # Restart services
-docker compose down # Stop everything
-./status.sh # Health check all services
+# dream-cli handles compose flags automatically (works on AMD and NVIDIA)
+dream status # Check all services
+dream list # See available services and status
+dream logs llm # Watch llama-server logs (alias: llm)
+dream logs stt # Watch Whisper logs (alias: stt)
+dream restart whisper # Restart a service
+dream enable n8n # Enable an extension
+dream disable comfyui # Disable an extension
+dream stop # Stop everything
+dream start # Start everything
+
+# Management scripts
+./scripts/session-cleanup.sh # Clean up bloated agent sessions
+./scripts/llm-cold-storage.sh --status # Check model hot/cold storage
+dream mode status # Show current mode
```
## Comparison
| Feature | Dream Server | Ollama + WebUI | LocalAI |
|---------|:---:|:---:|:---:|
-| Full-stack one-command install | **LLM + voice + workflows + RAG + privacy** | LLM + chat only | LLM only |
-| Hardware auto-detect + model selection | **Yes** | No | No |
-| Voice agents (STT + TTS + WebRTC) | **Built in** | No | Limited |
-| Inference engine | **vLLM** (continuous batching) | llama.cpp | llama.cpp |
+| Full-stack one-command install | **LLM + agent + workflows + RAG** | LLM + chat only | LLM only |
+| Hardware auto-detect + model selection | **NVIDIA + AMD Strix Halo** | No | No |
+| AMD APU / unified memory support | **ROCm + llama-server** | Partial (Vulkan) | No |
+| Inference engine | **llama-server** (all GPUs) | llama.cpp | llama.cpp |
+| Autonomous AI agent | **OpenClaw** | No | No |
| Workflow automation | **n8n (400+ integrations)** | No | No |
-| PII redaction / privacy tools | **Built in** | No | No |
-| Multi-GPU | **Yes** | Partial | Partial |
+| LLM usage monitoring | **Open WebUI built-in** | No | No |
+| Multi-GPU | **Yes** (NVIDIA) | Partial | Partial |
---
## Troubleshooting FAQ
-**vLLM won't start / OOM errors**
-- Reduce `MAX_CONTEXT` in `.env` (try 4096)
-- Lower `GPU_UTIL` to 0.85
+**llama-server won't start / OOM errors**
+- Reduce `CTX_SIZE` in `.env` (try 4096)
- Use a smaller model: `./install.sh --tier 1`
**"Model not found" on first boot**
- First launch downloads the model (10-30 min depending on size)
-- Watch progress: `docker compose logs -f vllm`
+- Watch progress: `dream logs llm`
**Open WebUI shows "Connection error"**
-- vLLM is still loading. Wait for health check to pass: `curl localhost:8000/health`
+- llama-server is still loading. Wait for health check to pass: `curl localhost:8080/health`
**Port already in use**
- Change ports in `.env` (e.g., `WEBUI_PORT=3001`)
@@ -220,16 +332,29 @@ docker compose down # Stop everything
- Verify with `nvidia-smi` inside WSL
- Ensure Docker Desktop has WSL integration enabled
+**AMD Strix Halo: llama-server won't start**
+- Check GGUF model exists: `ls -lh data/models/*.gguf`
+- Watch logs: `docker compose -f docker-compose.base.yml -f docker-compose.amd.yml logs -f llama-server`
+- Verify GPU devices: `ls /dev/kfd /dev/dri/renderD128`
+- Ensure ROCm env: `HSA_OVERRIDE_GFX_VERSION=11.5.1` must be set
+
+**AMD: "missing tensor" errors**
+- Use upstream llama.cpp GGUF files (from `unsloth/` on HuggingFace)
+- Ollama's GGUF format has incompatible tensor naming for qwen3next architecture
+- Do NOT use Ollama blob files with llama-server
+
---
## Documentation
+- [docs/README.md](docs/README.md) โ **Full documentation index** (start here)
- [QUICKSTART.md](QUICKSTART.md) โ Detailed setup guide
- [HARDWARE-GUIDE.md](docs/HARDWARE-GUIDE.md) โ What to buy
-- [TROUBLESHOOTING.md](docs/TROUBLESHOOTING.md) โ Extended troubleshooting
+- [EXTENSIONS.md](docs/EXTENSIONS.md) โ Add services, manifests, dashboard plugins
+- [INSTALLER-ARCHITECTURE.md](docs/INSTALLER-ARCHITECTURE.md) โ Modding the installer
+- [INTEGRATION-GUIDE.md](docs/INTEGRATION-GUIDE.md) โ Connect your apps
- [SECURITY.md](SECURITY.md) โ Security best practices
-- [OPENCLAW-INTEGRATION.md](docs/OPENCLAW-INTEGRATION.md) โ Connect OpenClaw agents
-- [Workflows README](workflows/README.md) โ Pre-built n8n workflows
+- [CHANGELOG.md](CHANGELOG.md) โ Version history
## License
@@ -237,4 +362,4 @@ Apache 2.0 โ Use it, modify it, sell it. Just don't blame us.
---
-*Built by [The Collective](https://github.com/Light-Heart-Labs/Lighthouse-AI) โ Android-17, Todd, and friends*
+*Built by [The Collective](https://github.com/Light-Heart-Labs/DreamServer) โ Android-17, Todd, and friends*
diff --git a/dream-server/SECURITY.md b/dream-server/SECURITY.md
index fbfedda3b..823df56da 100644
--- a/dream-server/SECURITY.md
+++ b/dream-server/SECURITY.md
@@ -61,7 +61,7 @@ For access from other devices on your network:
```bash
# Allow specific ports from local network
sudo ufw allow from 192.168.0.0/24 to any port 3000 # WebUI
-sudo ufw allow from 192.168.0.0/24 to any port 8000 # LLM API
+sudo ufw allow from 192.168.0.0/24 to any port 8080 # LLM API
```
### Exposing to Internet (Not Recommended)
@@ -92,7 +92,7 @@ server {
location / {
limit_req zone=ai burst=5;
- proxy_pass http://127.0.0.1:8000;
+ proxy_pass http://127.0.0.1:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
@@ -111,7 +111,7 @@ Prevent runaway containers:
```yaml
services:
- vllm:
+ llama-server:
deploy:
resources:
limits:
@@ -122,7 +122,7 @@ services:
### Principle of Least Privilege
-The docker-compose.yml uses:
+The docker-compose files use:
- Non-root users where possible
- Read-only volumes where appropriate
- GPU access only for services that need it
@@ -166,10 +166,10 @@ gpg -d dream-backup-YYYYMMDD.tar.gz.gpg | tar -xz
### Recommended Architecture
```
-Client โ LiteLLM (with API key) โ vLLM (localhost only)
+Client โ LiteLLM (with API key) โ llama-server (localhost only)
```
-vLLM has no authentication by default. Use LiteLLM as your authenticated gateway for remote access.
+llama-server has no authentication by default. Use LiteLLM as your authenticated gateway for remote access.
### Service-Specific
@@ -177,7 +177,7 @@ vLLM has no authentication by default. Use LiteLLM as your authenticated gateway
|---------|------|-------|
| Open WebUI | Built-in | Change admin password, disable signups |
| n8n | Basic auth | Use strong password, enable 2FA |
-| vLLM | None | Keep localhost-only, use LiteLLM for remote |
+| llama-server | None | Keep localhost-only, use LiteLLM for remote |
| LiteLLM | API key | Set `LITELLM_KEY` in .env |
---
@@ -186,7 +186,7 @@ vLLM has no authentication by default. Use LiteLLM as your authenticated gateway
```bash
# Watch for errors
-docker compose logs -f vllm | grep -i error
+docker compose logs -f llama-server | grep -i error
# Monitor resource usage
watch -n 5 'nvidia-smi; docker stats --no-stream'
@@ -209,7 +209,7 @@ docker compose pull
docker compose up -d
```
-Watch for security updates to: vLLM, Open WebUI, n8n, base images.
+Watch for security updates to: llama-server, Open WebUI, n8n, base images.
---
diff --git a/dream-server/agents/templates/README.md b/dream-server/agents/templates/README.md
index 3fc5a7e2d..ae57842ee 100644
--- a/dream-server/agents/templates/README.md
+++ b/dream-server/agents/templates/README.md
@@ -3,7 +3,7 @@
**Mission:** M7 (OpenClaw Frontier Pushing)
**Status:** 5 templates created, awaiting validation
-Validated agent templates that work reliably on local Qwen2.5-32B-Instruct-AWQ.
+Validated agent templates that work reliably on local Qwen3-14B.
## Templates
@@ -29,12 +29,12 @@ Validated agent templates that work reliably on local Qwen2.5-32B-Instruct-AWQ.
agent:
template: code-assistant
override:
- model: local-vllm/Qwen/Qwen2.5-32B-Instruct-AWQ
+ model: local-llama/qwen3-14b
```
## Validation Results (2026-02-11)
-Tested on: Qwen2.5-32B-Instruct-AWQ-Instruct-AWQ (local)
+Tested on: Qwen3-14B-Instruct-AWQ (local)
Test command: `python3 tests/validate-agent-templates.py`
| Template | Tests | Passed | Status |
@@ -55,7 +55,7 @@ Test command: `python3 tests/validate-agent-templates.py`
## Design Principles
-1. **Local-first:** Templates optimized for Qwen2.5-32B-Instruct-AWQ (free, fast, private)
+1. **Local-first:** Templates optimized for Qwen3-14B (free, fast, private)
2. **Fallback-aware:** Creative tasks route to Kimi; technical tasks stay local
3. **Tool-appropriate:** Each template gets only the tools it needs
4. **Safety-conscious:** Dangerous operations flagged (system-admin)
diff --git a/dream-server/agents/templates/code-assistant.yaml b/dream-server/agents/templates/code-assistant.yaml
index c05046702..336d3e048 100644
--- a/dream-server/agents/templates/code-assistant.yaml
+++ b/dream-server/agents/templates/code-assistant.yaml
@@ -1,13 +1,13 @@
# Code Assistant Agent Template
# Mission: M7 (OpenClaw Frontier Pushing)
-# Validated on: Qwen2.5-32B-Instruct-AWQ
+# Validated on: Qwen3-14B
# Purpose: Programming help, debugging, code review
agent:
name: code-assistant
description: "Programming assistant for code generation, debugging, and review"
- model: local-vllm/Qwen/Qwen2.5-32B-Instruct-AWQ
+ model: local-llama/qwen3-14b
# Qwen Coder excels at programming tasks - no fallback needed
system_prompt: |
@@ -59,7 +59,7 @@ agent:
# /agent load code-assistant
notes:
- - Optimized for Qwen2.5-Coder - works reliably on local hardware
+ - Optimized for Qwen3 - works reliably on local hardware
- Handles Python, JavaScript, Go, Rust, and most common languages
- For very large codebases, consider splitting into smaller chunks
- Tested on RTX 3090 (24GB) with ~500ms response time
diff --git a/dream-server/agents/templates/data-analyst.yaml b/dream-server/agents/templates/data-analyst.yaml
index 9a9ffcb6c..962390ec1 100644
--- a/dream-server/agents/templates/data-analyst.yaml
+++ b/dream-server/agents/templates/data-analyst.yaml
@@ -1,13 +1,13 @@
# Data Analyst Agent Template
# Mission: M7 (OpenClaw Frontier Pushing)
-# Validated on: Qwen2.5-32B-Instruct-AWQ
+# Validated on: Qwen3-14B
# Purpose: CSV/JSON analysis, data processing, visualization guidance
agent:
name: data-analyst
description: "Data analysis assistant for processing CSV, JSON, and structured data"
- model: local-vllm/Qwen/Qwen2.5-32B-Instruct-AWQ
+ model: local-llama/qwen3-14b
# Coder model excels at data manipulation tasks
system_prompt: |
diff --git a/dream-server/agents/templates/research-assistant.yaml b/dream-server/agents/templates/research-assistant.yaml
index 3c98251aa..641307738 100644
--- a/dream-server/agents/templates/research-assistant.yaml
+++ b/dream-server/agents/templates/research-assistant.yaml
@@ -1,13 +1,13 @@
# Research Assistant Agent Template
# Mission: M7 (OpenClaw Frontier Pushing)
-# Validated on: Qwen2.5-32B-Instruct-AWQ
+# Validated on: Qwen3-14B
# Purpose: Web research, summarization, fact-checking
agent:
name: research-assistant
description: "Research assistant for web search, summarization, and analysis"
- model: local-vllm/Qwen/Qwen2.5-32B-Instruct-AWQ
+ model: local-llama/qwen3-14b
# Falls back to Kimi for complex synthesis if needed
fallback_model: moonshot/kimi-k2-0711-preview
diff --git a/dream-server/agents/templates/system-admin.yaml b/dream-server/agents/templates/system-admin.yaml
index 265ce50d9..e0c81a025 100644
--- a/dream-server/agents/templates/system-admin.yaml
+++ b/dream-server/agents/templates/system-admin.yaml
@@ -1,13 +1,13 @@
# System Admin Assistant Agent Template
# Mission: M7 (OpenClaw Frontier Pushing)
-# Validated on: Qwen2.5-32B-Instruct-AWQ
+# Validated on: Qwen3-14B
# Purpose: Docker management, server administration, troubleshooting
agent:
name: system-admin
description: "System administration assistant for Docker, Linux, and server management"
- model: local-vllm/Qwen/Qwen2.5-32B-Instruct-AWQ
+ model: local-llama/qwen3-14b
# Coder model excels at system commands and scripting
system_prompt: |
diff --git a/dream-server/agents/templates/writing-assistant.yaml b/dream-server/agents/templates/writing-assistant.yaml
index a5af4089d..6e54e4044 100644
--- a/dream-server/agents/templates/writing-assistant.yaml
+++ b/dream-server/agents/templates/writing-assistant.yaml
@@ -1,6 +1,6 @@
# Writing Assistant Agent Template
# Mission: M7 (OpenClaw Frontier Pushing)
-# Validated on: Qwen2.5-32B-Instruct-AWQ
+# Validated on: Qwen3-14B
# Purpose: Creative writing, editing, style improvement
# NOTE: Local Qwen has limitations on creative tasks - use with fallback
@@ -8,8 +8,8 @@ agent:
name: writing-assistant
description: "Writing assistant for drafting, editing, and improving text"
- model: local-vllm/Qwen/Qwen2.5-32B-Instruct-AWQ
- # IMPORTANT: Qwen Coder is NOT optimized for creative writing
+ model: local-llama/qwen3-14b
+ # IMPORTANT: Qwen3 is NOT optimized for creative writing
# This template uses fallback for creative generation tasks
fallback_model: moonshot/kimi-k2-0711-preview
@@ -79,7 +79,7 @@ agent:
import: "agents/templates/writing-assistant.yaml"
notes:
- - CRITICAL: Local Qwen Coder struggles with creative generation
+ - CRITICAL: Local Qwen3 struggles with creative generation
- Use this template for EDITING tasks (grammar, clarity, structure)
- Creative generation automatically routes to fallback model
- For pure creative work, consider using Kimi/Claude directly
diff --git a/dream-server/agents/voice-offline/Dockerfile b/dream-server/agents/voice-offline/Dockerfile
deleted file mode 100644
index 3a4b442a9..000000000
--- a/dream-server/agents/voice-offline/Dockerfile
+++ /dev/null
@@ -1,47 +0,0 @@
-# Dream Server Voice Agent - OFFLINE MODE
-# Local-only voice chat using LiveKit + local LLM
-# M1 Phase 2 - Zero cloud dependencies
-#
-# Build: docker build -t dream-voice-agent-offline .
-# Run: docker run --network dream-network-offline dream-voice-agent-offline
-
-FROM python:3.11-slim
-
-WORKDIR /app
-
-# Install system deps (portaudio for audio, ffmpeg for transcoding)
-RUN apt-get update && apt-get install -y --no-install-recommends \
- gcc \
- libffi-dev \
- libportaudio2 \
- libportaudiocpp0 \
- portaudio19-dev \
- ffmpeg \
- curl \
- wget \
- && rm -rf /var/lib/apt/lists/*
-
-# Install Python deps
-COPY requirements.txt .
-RUN pip install --no-cache-dir -r requirements.txt
-
-# Copy agent code
-COPY agent.py .
-COPY entrypoint.sh .
-RUN chmod +x entrypoint.sh
-
-# Copy deterministic module
-COPY deterministic/ ./deterministic/
-
-# Copy offline-specific flows
-COPY flows/ ./flows/
-
-# Create health check endpoint
-COPY health_check.py .
-
-# Non-root user for security
-RUN useradd -m -u 1000 agent && chown -R agent:agent /app
-USER agent
-
-# Run the agent
-CMD ["./entrypoint.sh"]
\ No newline at end of file
diff --git a/dream-server/agents/voice-offline/agent.py b/dream-server/agents/voice-offline/agent.py
deleted file mode 100644
index e93c25a5d..000000000
--- a/dream-server/agents/voice-offline/agent.py
+++ /dev/null
@@ -1,316 +0,0 @@
-#!/usr/bin/env python3
-"""
-Dream Server Voice Agent - Offline Mode
-Main agent implementation for local-only voice chat
-M1 Phase 2 - Zero cloud dependencies
-
-Uses LiveKit Agents SDK v1.4+ with local model backends:
-- LLM: vLLM (OpenAI-compatible)
-- STT: Whisper (OpenAI-compatible API)
-- TTS: Kokoro (OpenAI-compatible API)
-- VAD: Silero (built-in)
-"""
-
-import os
-import asyncio
-import logging
-import signal
-from typing import Optional
-
-from livekit.agents import (
- JobContext,
- JobProcess,
- WorkerOptions,
- cli,
-)
-from livekit.agents import Agent, AgentSession
-from livekit.plugins import silero, openai as openai_plugin
-
-# Configure logging
-logging.basicConfig(
- level=logging.INFO,
- format='%(asctime)s | %(name)s | %(levelname)s | %(message)s'
-)
-logger = logging.getLogger("dream-voice-offline")
-
-# Environment config
-LIVEKIT_URL = os.getenv("LIVEKIT_URL", "ws://localhost:7880")
-LLM_URL = os.getenv("LLM_URL", "http://vllm:8000/v1")
-LLM_MODEL = os.getenv("LLM_MODEL", "Qwen/Qwen2.5-32B-Instruct-AWQ")
-STT_URL = os.getenv("STT_URL", "http://whisper:9000/v1")
-TTS_URL = os.getenv("TTS_URL", "http://tts:8880/v1")
-TTS_VOICE = os.getenv("TTS_VOICE", "af_heart")
-
-# Offline mode settings
-OFFLINE_MODE = os.getenv("OFFLINE_MODE", "true").lower() == "true"
-
-# System prompt for offline mode
-OFFLINE_SYSTEM_PROMPT = """You are Dream Agent running in offline mode on local hardware.
-You have access to local tools and services only. Be helpful, accurate, and maintain privacy.
-Keep responses conversational and concise - this is voice, not text.
-
-Key capabilities:
-- Answer questions using local knowledge
-- Help with file operations and system tasks
-- Provide technical assistance for local services
-- Maintain conversation context
-
-Limitations:
-- Cannot access external websites or APIs
-- Cannot provide real-time information
-- Cannot perform web searches
-- All processing happens locally on this machine
-
-Always acknowledge when asked about external information that you operate in offline mode."""
-
-
-async def check_service_health(url: str, name: str, timeout: int = 5) -> bool:
- """Check if a service is healthy before starting."""
- import aiohttp
- try:
- async with aiohttp.ClientSession() as session:
- async with session.get(url, timeout=aiohttp.ClientTimeout(total=timeout)) as resp:
- healthy = resp.status == 200
- if healthy:
- logger.info(f" {name} is healthy")
- else:
- logger.warning(f" {name} returned status {resp.status}")
- return healthy
- except Exception as e:
- logger.warning(f" {name} unreachable: {e}")
- return False
-
-
-class OfflineVoiceAgent(Agent):
- """
- Voice agent for offline/local-only operation.
-
- Features:
- - Greets user on entry
- - Handles interruptions (user can stop bot speech)
- - Uses only local services (no cloud dependencies)
- - Falls back gracefully if services fail
- """
-
- def __init__(self) -> None:
- super().__init__(
- instructions=OFFLINE_SYSTEM_PROMPT,
- allow_interruptions=True,
- )
- self.error_count = 0
- self.max_errors = 3
-
- async def on_enter(self):
- """Called when agent becomes active. Send greeting."""
- logger.info("Agent entered - sending greeting")
- try:
- self.session.generate_reply(
- instructions="Greet the user warmly and briefly introduce yourself as their local offline voice assistant."
- )
- except Exception as e:
- logger.error(f"Failed to send greeting: {e}")
- self.error_count += 1
-
- async def on_exit(self):
- """Called when agent is shutting down."""
- logger.info("Agent exiting - cleanup")
-
- async def on_error(self, error: Exception):
- """Handle errors gracefully."""
- self.error_count += 1
- logger.error(f"Agent error ({self.error_count}/{self.max_errors}): {error}")
-
- if self.error_count >= self.max_errors:
- logger.critical("Max errors reached, agent will restart")
- raise error
-
-
-async def create_llm() -> Optional[openai_plugin.LLM]:
- """Create local LLM instance."""
- try:
- llm = openai_plugin.LLM(
- model=LLM_MODEL,
- base_url=LLM_URL,
- api_key="not-needed", # Local vLLM doesn't require API key
- )
- logger.info(f" LLM configured: {LLM_MODEL}")
- return llm
- except Exception as e:
- logger.error(f" Failed to create LLM: {e}")
- return None
-
-
-async def create_stt() -> Optional[openai_plugin.STT]:
- """Create local STT instance."""
- try:
- stt_base_url = STT_URL.removesuffix('/v1').removesuffix('/')
- healthy = await check_service_health(f"{stt_base_url}/health", "STT (Whisper)")
- if not healthy:
- logger.warning("STT service not healthy, continuing without speech recognition")
- return None
-
- stt = openai_plugin.STT(
- model="whisper-1",
- base_url=STT_URL,
- api_key="not-needed",
- )
- logger.info(" STT configured")
- return stt
- except Exception as e:
- logger.error(f" Failed to create STT: {e}")
- logger.warning("Continuing without speech recognition")
- return None
-
-
-async def create_tts() -> Optional[openai_plugin.TTS]:
- """Create local TTS instance."""
- try:
- tts_base_url = TTS_URL.removesuffix('/v1').removesuffix('/')
- healthy = await check_service_health(f"{tts_base_url}/health", "TTS (Kokoro)")
- if not healthy:
- logger.warning("TTS service not healthy, continuing without speech synthesis")
- return None
-
- tts = openai_plugin.TTS(
- model="kokoro",
- voice=TTS_VOICE,
- base_url=TTS_URL,
- api_key="not-needed",
- )
- logger.info(f" TTS configured with voice: {TTS_VOICE}")
- return tts
- except Exception as e:
- logger.error(f" Failed to create TTS: {e}")
- logger.warning("Continuing without speech synthesis")
- return None
-
-
-async def entrypoint(ctx: JobContext):
- """
- Main entry point for the offline voice agent job.
-
- Includes:
- - Service health checks
- - Graceful degradation if services fail
- - Reconnection logic
- """
- logger.info(f"Voice agent connecting to room: {ctx.room.name}")
-
- # Health check phase
- logger.info("Performing service health checks...")
- llm_healthy = await check_service_health(f"{LLM_URL}/models", "LLM (vLLM)")
-
- if not llm_healthy:
- logger.error("LLM service not healthy - cannot start agent")
- raise RuntimeError("LLM service required but not available")
-
- # Create components with error handling
- llm = await create_llm()
- if not llm:
- raise RuntimeError("Failed to create LLM - agent cannot start")
-
- stt = await create_stt()
- tts = await create_tts()
-
- # Create VAD from prewarmed cache or load fresh
- try:
- vad = ctx.proc.userdata.get("vad") or silero.VAD.load()
- logger.info(" VAD loaded")
- except Exception as e:
- logger.error(f" Failed to load VAD: {e}")
- logger.warning("Starting without voice activity detection")
- vad = None
-
- # Create session - only include working components
- session_kwargs = {"llm": llm}
- if stt:
- session_kwargs["stt"] = stt
- if tts:
- session_kwargs["tts"] = tts
- if vad:
- session_kwargs["vad"] = vad
-
- session = AgentSession(**session_kwargs)
-
- # Create agent
- agent = OfflineVoiceAgent()
-
- # Setup graceful shutdown
- shutdown_event = asyncio.Event()
-
- def signal_handler(sig, frame):
- logger.info("Shutdown signal received")
- shutdown_event.set()
-
- signal.signal(signal.SIGTERM, signal_handler)
- signal.signal(signal.SIGINT, signal_handler)
-
- # Connect to room first (required by LiveKit SDK)
- max_retries = 3
- for attempt in range(max_retries):
- try:
- await ctx.connect()
- logger.info("Connected to room")
- break
- except Exception as e:
- logger.error(f"Room connection failed (attempt {attempt + 1}/{max_retries}): {e}")
- if attempt == max_retries - 1:
- raise
- await asyncio.sleep(1)
-
- # Start session after room connection
- for attempt in range(max_retries):
- try:
- await session.start(agent=agent, room=ctx.room)
- logger.info("Offline voice agent session started")
- break
- except Exception as e:
- logger.error(f"Session start failed (attempt {attempt + 1}/{max_retries}): {e}")
- if attempt == max_retries - 1:
- raise
- await asyncio.sleep(1)
-
- # Wait for shutdown signal
- try:
- await shutdown_event.wait()
- except asyncio.CancelledError:
- logger.info("Agent task cancelled")
- finally:
- logger.info("Shutting down offline voice agent...")
- try:
- await session.close()
- except Exception as e:
- logger.error(f"Error during shutdown: {e}")
-
-
-def prewarm(proc: JobProcess):
- """Prewarm function - load models before first job."""
- logger.info("Prewarming offline voice agent...")
- try:
- proc.userdata["vad"] = silero.VAD.load()
- logger.info(" VAD model loaded")
- except Exception as e:
- logger.error(f" Failed to load VAD: {e}")
- proc.userdata["vad"] = None
-
-
-if __name__ == "__main__":
- agent_port = int(os.getenv("AGENT_PORT", "8181"))
-
- # Log startup info
- logger.info("=" * 60)
- logger.info("Dream Server Voice Agent - OFFLINE MODE")
- logger.info(f"Port: {agent_port}")
- logger.info(f"LLM: {LLM_URL}")
- logger.info(f"STT: {STT_URL}")
- logger.info(f"TTS: {TTS_URL}")
- logger.info(f"Offline Mode: {OFFLINE_MODE}")
- logger.info("=" * 60)
-
- cli.run_app(
- WorkerOptions(
- entrypoint_fnc=entrypoint,
- prewarm_fnc=prewarm,
- port=agent_port,
- )
- )
diff --git a/dream-server/agents/voice-offline/deterministic/__init__.py b/dream-server/agents/voice-offline/deterministic/__init__.py
deleted file mode 100644
index 07c997bff..000000000
--- a/dream-server/agents/voice-offline/deterministic/__init__.py
+++ /dev/null
@@ -1,216 +0,0 @@
-#!/usr/bin/env python3
-"""
-Deterministic classifier for offline voice agent
-Handles intent classification using local models
-"""
-
-import os
-import json
-import logging
-from typing import Dict, List, Optional, Tuple
-import numpy as np
-
-from .router import DeterministicRouter
-
-logger = logging.getLogger(__name__)
-
-
-class KeywordClassifier:
- """Simple keyword-based intent classifier for offline mode"""
-
- def __init__(self, keywords: Dict[str, List[str]]):
- """
- Args:
- keywords: Dict mapping intent names to keyword lists
- """
- self.keywords = keywords or {}
-
- def classify(self, text: str) -> tuple[str, float]:
- """Classify text by keyword matching"""
- text_lower = text.lower()
- best_intent = "fallback"
- best_score = 0.0
-
- for intent, kw_list in self.keywords.items():
- matches = sum(1 for kw in kw_list if kw.lower() in text_lower)
- if matches > 0:
- score = matches / len(kw_list)
- if score > best_score:
- best_score = score
- best_intent = intent
-
- return best_intent, best_score
-
-
-class FSMExecutor:
- """Finite State Machine executor for deterministic flows"""
-
- def __init__(self, flows_dir: str):
- self.flows_dir = flows_dir
- self.flows: Dict[str, dict] = {}
- self.current_flow: Optional[str] = None
- self.current_state: Optional[str] = None
- self._load_flows()
-
- def _load_flows(self):
- """Load flow definitions from JSON files"""
- if not os.path.exists(self.flows_dir):
- logger.warning(f"Flows directory not found: {self.flows_dir}")
- return
-
- for filename in os.listdir(self.flows_dir):
- if filename.endswith('.json'):
- filepath = os.path.join(self.flows_dir, filename)
- try:
- with open(filepath, 'r') as f:
- flow = json.load(f)
- flow_name = flow.get('name', filename.replace('.json', ''))
- self.flows[flow_name] = flow
- logger.info(f"Loaded flow: {flow_name}")
- except Exception as e:
- logger.error(f"Failed to load flow {filename}: {e}")
-
- def start_flow(self, flow_name: str) -> Optional[str]:
- """Start a flow and return initial response"""
- if flow_name not in self.flows:
- return None
-
- self.current_flow = flow_name
- flow = self.flows[flow_name]
- self.current_state = flow.get('initial_state', 'start')
-
- # Return initial greeting if defined
- states = flow.get('states', {})
- if self.current_state in states:
- return states[self.current_state].get('say')
- return None
-
- def process(self, text: str) -> Optional[str]:
- """Process user input and return response"""
- if not self.current_flow or not self.current_state:
- return None
-
- flow = self.flows[self.current_flow]
- states = flow.get('states', {})
- current = states.get(self.current_state, {})
-
- # Simple transition logic - look for next state
- transitions = current.get('transitions', {})
- for trigger, next_state in transitions.items():
- if trigger.lower() in text.lower() or trigger == '*':
- self.current_state = next_state
- if next_state in states:
- return states[next_state].get('say')
-
- # No matching transition - return default or None
- return current.get('fallback_say')
-
-class DeterministicClassifier:
- """Simple rule-based classifier for offline mode"""
-
- def __init__(self, flows_dir: str):
- self.flows_dir = flows_dir
- self.intents = {}
- self.patterns = {}
-
- async def initialize(self):
- """Load deterministic flows"""
- try:
- await self._load_flows()
- logger.info(f"Loaded {len(self.intents)} deterministic intents")
- except Exception as e:
- logger.warning(f"Failed to load deterministic flows: {e}")
-
- async def _load_flows(self):
- """Load flow definitions from JSON files"""
- if not os.path.exists(self.flows_dir):
- logger.warning(f"Flows directory not found: {self.flows_dir}")
- return
-
- for filename in os.listdir(self.flows_dir):
- if filename.endswith('.json'):
- filepath = os.path.join(self.flows_dir, filename)
- try:
- with open(filepath, 'r') as f:
- flow = json.load(f)
- intent_name = flow.get('intent', filename.replace('.json', ''))
- self.intents[intent_name] = flow
-
- # Extract patterns
- if 'patterns' in flow:
- self.patterns[intent_name] = flow['patterns']
- except Exception as e:
- logger.error(f"Failed to load flow {filename}: {e}")
-
- async def classify(self, text: str, confidence_threshold: float = 0.85) -> Tuple[str, float]:
- """
- Classify intent using rule-based matching
- Returns (intent, confidence)
- """
- text_lower = text.lower().strip()
-
- best_intent = "general"
- best_confidence = 0.0
-
- for intent_name, patterns in self.patterns.items():
- for pattern in patterns:
- if isinstance(pattern, str):
- # Simple substring matching
- if pattern.lower() in text_lower:
- confidence = min(1.0, len(pattern) / len(text_lower))
- if confidence > best_confidence:
- best_confidence = confidence
- best_intent = intent_name
- elif isinstance(pattern, dict):
- # More complex pattern matching
- keywords = pattern.get('keywords', [])
- required_all = pattern.get('required_all', False)
-
- matches = 0
- total_keywords = len(keywords)
-
- for keyword in keywords:
- if keyword.lower() in text_lower:
- matches += 1
-
- if required_all and matches == total_keywords:
- confidence = 1.0
- elif not required_all and matches > 0:
- confidence = matches / total_keywords
- else:
- confidence = 0.0
-
- if confidence > best_confidence:
- best_confidence = confidence
- best_intent = intent_name
-
- # Apply threshold
- if best_confidence < confidence_threshold:
- return "general", 0.0
-
- return best_intent, best_confidence
-
- async def get_intent_info(self, intent: str) -> Optional[Dict]:
- """Get intent configuration"""
- return self.intents.get(intent)
-
-# Example usage
-if __name__ == "__main__":
- import asyncio
-
- async def test():
- classifier = DeterministicClassifier("./flows")
- await classifier.initialize()
-
- test_texts = [
- "I need to book a restaurant reservation",
- "What's the weather like",
- "Can you help me with my order",
- "Hello, how are you"
- ]
-
- for text in test_texts:
- intent, confidence = await classifier.classify(text)
- print(f"Text: '{text}' -> Intent: {intent} (confidence: {confidence})")
-
- asyncio.run(test())
\ No newline at end of file
diff --git a/dream-server/agents/voice-offline/deterministic/router.py b/dream-server/agents/voice-offline/deterministic/router.py
deleted file mode 100644
index 2b3c7af72..000000000
--- a/dream-server/agents/voice-offline/deterministic/router.py
+++ /dev/null
@@ -1,145 +0,0 @@
-#!/usr/bin/env python3
-"""
-Deterministic router for offline voice agent
-Routes conversations based on classified intents
-"""
-
-import json
-import logging
-from typing import Dict, Any, List
-from datetime import datetime, timezone
-
-logger = logging.getLogger(__name__)
-
-class DeterministicRouter:
- """Routes conversations based on deterministic flows"""
-
- def __init__(self, flows_dir: str = None, classifier=None, fsm=None, fallback_threshold: float = 0.85):
- self.flows_dir = flows_dir
- self.classifier = classifier
- self.fsm = fsm
- self.fallback_threshold = fallback_threshold
- self.flows = {}
- self.current_flows = {} # Track active flows per session
-
- async def initialize(self):
- """Load flow definitions"""
- import os
- if not os.path.exists(self.flows_dir):
- logger.warning(f"Flows directory not found: {self.flows_dir}")
- return
-
- for filename in os.listdir(self.flows_dir):
- if filename.endswith('.json'):
- filepath = os.path.join(self.flows_dir, filename)
- try:
- with open(filepath, 'r') as f:
- flow = json.load(f)
- flow_name = filename.replace('.json', '')
- self.flows[flow_name] = flow
- except Exception as e:
- logger.error(f"Failed to load flow {filename}: {e}")
-
- async def get_response(self, session_id: str, intent: str, user_input: str, context: Dict[str, Any] = None) -> str:
- """Get response based on flow and current state"""
- if intent not in self.flows:
- return self.get_fallback_response(user_input)
-
- flow = self.flows[intent]
-
- # Initialize session if new
- if session_id not in self.current_flows:
- self.current_flows[session_id] = {
- "intent": intent,
- "current_step": 0,
- "data": {},
- "started": datetime.now(timezone.utc).isoformat()
- }
-
- session = self.current_flows[session_id]
-
- # Get current step
- steps = flow.get("steps", [])
- current_step = session["current_step"]
-
- if current_step >= len(steps):
- # Flow completed
- response = flow.get("completion_message", "Thank you! Is there anything else I can help you with?")
- del self.current_flows[session_id] # Clean up
- return response
-
- step = steps[current_step]
-
- # Validate required fields
- if "validation" in step:
- validation = step["validation"]
- if validation.get("type") == "regex":
- import re
- pattern = validation.get("pattern", ".*")
- if not re.match(pattern, user_input, re.IGNORECASE):
- return validation.get("error_message", "I didn't understand that. Please try again.")
-
- # Store user response
- if "field" in step:
- session["data"][step["field"]] = user_input
-
- # Get next response
- response = step.get("response", "Thank you for your input.")
-
- # Advance to next step
- session["current_step"] += 1
-
- return response
-
- def get_fallback_response(self, user_input: str) -> str:
- """Get fallback response for unmatched intents"""
- return "I understand you're asking about that, but I'm running in offline mode and can only help with tasks I have specific flows for. Would you like me to help with something else, or can you try rephrasing your request?"
-
- def reset_session(self, session_id: str):
- """Reset session state"""
- if session_id in self.current_flows:
- del self.current_flows[session_id]
-
- def get_session_info(self, session_id: str) -> Dict[str, Any]:
- """Get current session info"""
- return self.current_flows.get(session_id, {})
-
- def list_available_flows(self) -> List[str]:
- """List available flow names"""
- return list(self.flows.keys())
-
-# Example flows
-EXAMPLE_FLOWS = {
- "restaurant_reservation": {
- "steps": [
- {
- "response": "I'd be happy to help you book a restaurant reservation. What date would you like?",
- "field": "date"
- },
- {
- "response": "What time would you prefer?",
- "field": "time"
- },
- {
- "response": "How many people will be dining?",
- "field": "party_size"
- },
- {
- "response": "Do you have any dietary restrictions or special requests?",
- "field": "special_requests"
- }
- ],
- "completion_message": "Perfect! I've collected all the details for your reservation. In a real system, I would now process this booking."
- }
-}
-
-if __name__ == "__main__":
- import asyncio
-
- async def test():
- router = DeterministicRouter("./flows")
- await router.initialize()
-
- print("Available flows:", router.list_available_flows())
-
- asyncio.run(test())
\ No newline at end of file
diff --git a/dream-server/agents/voice-offline/entrypoint.sh b/dream-server/agents/voice-offline/entrypoint.sh
deleted file mode 100644
index 088ae5bee..000000000
--- a/dream-server/agents/voice-offline/entrypoint.sh
+++ /dev/null
@@ -1,70 +0,0 @@
-#!/bin/bash
-# Entrypoint script for Dream Server Voice Agent - Offline Mode
-# M1 Phase 2 - Zero cloud dependencies
-
-set -e
-
-echo "=== Dream Server Voice Agent (Offline Mode) ==="
-echo "Starting at $(date)"
-
-# Environment validation
-if [[ -z "${LIVEKIT_URL}" ]]; then
- echo "ERROR: LIVEKIT_URL not set"
- exit 1
-fi
-
-if [[ -z "${LIVEKIT_API_KEY}" ]]; then
- echo "ERROR: LIVEKIT_API_KEY not set"
- exit 1
-fi
-
-if [[ -z "${LIVEKIT_API_SECRET}" ]]; then
- echo "ERROR: LIVEKIT_API_SECRET not set"
- exit 1
-fi
-
-# Health check dependencies
-echo "=== Health Check Dependencies ==="
-for service in vllm whisper tts; do
- # Map service names to environment variable names
- case "$service" in
- vllm) url_var="LLM_URL" ;;
- whisper) url_var="STT_URL" ;;
- tts) url_var="TTS_URL" ;;
- esac
- url="${!url_var}"
- if [[ -n "$url" ]]; then
- echo "Checking $service at $url..."
- if [[ "$service" == "vllm" ]]; then
- curl -f "${url}/health" || echo "WARNING: vLLM health check failed"
- elif [[ "$service" == "whisper" ]]; then
- curl -f "${url}/" || echo "WARNING: Whisper health check failed"
- elif [[ "$service" == "tts" ]]; then
- curl -f "${url}/health" || echo "WARNING: TTS health check failed"
- fi
- fi
-done
-
-# Set default values
-export LLM_MODEL=${LLM_MODEL:-"Qwen/Qwen2.5-32B-Instruct-AWQ"}
-export STT_MODEL=${STT_MODEL:-"base"}
-export TTS_VOICE=${TTS_VOICE:-"af_heart"}
-export DETERMINISTIC_ENABLED=${DETERMINISTIC_ENABLED:-"true"}
-export DETERMINISTIC_THRESHOLD=${DETERMINISTIC_THRESHOLD:-"0.85"}
-export OFFLINE_MODE=${OFFLINE_MODE:-"true"}
-
-echo "=== Configuration ==="
-echo "LLM Model: ${LLM_MODEL}"
-echo "STT Model: ${STT_MODEL}"
-echo "TTS Voice: ${TTS_VOICE}"
-echo "Deterministic Flows: ${DETERMINISTIC_ENABLED}"
-echo "Offline Mode: ${OFFLINE_MODE}"
-
-# Start health check server in background
-echo "Starting health check server..."
-python health_check.py &
-HEALTH_PID=$!
-
-# Start the main agent
-echo "Starting voice agent..."
-exec python agent.py
\ No newline at end of file
diff --git a/dream-server/agents/voice-offline/flows/restaurant_reservation.json b/dream-server/agents/voice-offline/flows/restaurant_reservation.json
deleted file mode 100644
index a43815574..000000000
--- a/dream-server/agents/voice-offline/flows/restaurant_reservation.json
+++ /dev/null
@@ -1,52 +0,0 @@
-{
- "intent": "restaurant_reservation",
- "patterns": [
- "book a table",
- "make a reservation",
- "restaurant booking",
- "reserve a table",
- "dinner reservation",
- "lunch reservation",
- "want to eat out",
- "book restaurant"
- ],
- "steps": [
- {
- "response": "I'd be happy to help you make a restaurant reservation! What date would you like to dine?",
- "field": "date",
- "validation": {
- "type": "regex",
- "pattern": "\\d{1,2}[/\\-]\\d{1,2}[/\\-]\\d{4}|today|tomorrow|next\\s+\\w+",
- "error_message": "Please provide a valid date (e.g., 'today', 'tomorrow', '12/15/2024', or 'next Friday')."
- }
- },
- {
- "response": "What time would you prefer for your reservation?",
- "field": "time",
- "validation": {
- "type": "regex",
- "pattern": "\\d{1,2}:\\d{2}|\\d{1,2}\\s*(am|pm)",
- "error_message": "Please provide a valid time (e.g., '7:30 PM' or '19:30')."
- }
- },
- {
- "response": "How many people will be in your party?",
- "field": "party_size",
- "validation": {
- "type": "regex",
- "pattern": "\\d+",
- "error_message": "Please tell me the number of people (e.g., '2', 'party of 4')."
- }
- },
- {
- "response": "Do you have any dietary restrictions, allergies, or special requests for your reservation?",
- "field": "special_requests",
- "validation": {
- "type": "any",
- "error_message": "Please let me know about any special requirements."
- }
- }
- ],
- "completion_message": "Excellent! I've collected all the details for your restaurant reservation:\n\n๐ Date: {date}\n๐ Time: {time}\n๐ฅ Party Size: {party_size} people\n๐ Special Requests: {special_requests}\n\nIn a real system, I would now process this booking and provide you with a confirmation number. Thank you for choosing our service!",
- "fallback_response": "I'm having trouble understanding that. Would you like me to help you make a restaurant reservation? I can assist with booking a table for you."
-}
\ No newline at end of file
diff --git a/dream-server/agents/voice-offline/health_check.py b/dream-server/agents/voice-offline/health_check.py
deleted file mode 100644
index 5fa43240b..000000000
--- a/dream-server/agents/voice-offline/health_check.py
+++ /dev/null
@@ -1,102 +0,0 @@
-#!/usr/bin/env python3
-"""
-Health check server for Dream Server Voice Agent - Offline Mode
-Simple HTTP server for container health checks
-"""
-
-import http.server
-import socketserver
-import json
-import os
-import requests
-import threading
-from datetime import datetime, timezone
-
-class HealthHandler(http.server.BaseHTTPRequestHandler):
- """Health check handler - only serves /health endpoint, no file serving"""
-
- def log_message(self, format, *args):
- """Suppress default request logging"""
- pass
-
- def do_GET(self):
- if self.path == '/health':
- self.send_health_check()
- else:
- self.send_error(404, "Not Found")
-
- def send_health_check(self):
- """Perform health check on all dependencies"""
- checks = {
- "status": "healthy",
- "timestamp": datetime.now(timezone.utc).isoformat(),
- "version": "1.0.0-offline",
- "checks": {}
- }
-
- # Check local services
- services = {
- "vllm": {
- "url": os.getenv("LLM_URL", "http://vllm:8000/v1").removesuffix("/v1").removesuffix("/") + "/health",
- "timeout": 5
- },
- "whisper": {
- "url": os.getenv("STT_URL", "http://whisper:9000/v1").removesuffix("/v1").removesuffix("/") + "/health",
- "timeout": 5
- },
- "tts": {
- "url": os.getenv("TTS_URL", "http://tts:8880/v1").removesuffix("/v1").removesuffix("/") + "/health",
- "timeout": 5
- }
- }
-
- all_healthy = True
-
- for service, config in services.items():
- try:
- response = requests.get(config["url"], timeout=config["timeout"])
- if response.status_code == 200:
- checks["checks"][service] = {
- "status": "healthy",
- "response_time": response.elapsed.total_seconds()
- }
- else:
- checks["checks"][service] = {
- "status": "unhealthy",
- "status_code": response.status_code
- }
- all_healthy = False
- except Exception as e:
- checks["checks"][service] = {
- "status": "unhealthy",
- "error": str(e)
- }
- all_healthy = False
-
- if not all_healthy:
- checks["status"] = "unhealthy"
-
- # Check LiveKit credentials
- if not os.getenv("LIVEKIT_API_SECRET"):
- checks["checks"]["livekit"] = {
- "status": "unhealthy",
- "error": "LIVEKIT_API_SECRET not set"
- }
- checks["status"] = "unhealthy"
- else:
- checks["checks"]["livekit"] = {"status": "healthy"}
-
- self.send_response(200 if all_healthy else 503)
- self.send_header('Content-type', 'application/json')
- self.end_headers()
- self.wfile.write(json.dumps(checks, indent=2).encode())
-
-def start_health_server():
- """Start health check server"""
- port = 8080
- with socketserver.TCPServer(("", port), HealthHandler) as httpd:
- print(f"Health check server started on port {port}")
- httpd.serve_forever()
-
-if __name__ == "__main__":
- start_health_server()
\ No newline at end of file
diff --git a/dream-server/agents/voice-offline/requirements.txt b/dream-server/agents/voice-offline/requirements.txt
deleted file mode 100644
index 1c9bfde83..000000000
--- a/dream-server/agents/voice-offline/requirements.txt
+++ /dev/null
@@ -1,42 +0,0 @@
-# Dream Server Voice Agent - Offline Mode Dependencies
-# Pinned for reproducibility - verified 2026-02-12
-#
-# Versions optimized for offline usage
-
-# LiveKit core
-livekit>=0.17.0
-livekit-agents>=1.0.0
-livekit-plugins-silero>=0.8.0
-
-# OFFLINE MODE: Use local OpenAI-compatible endpoints instead of cloud
-livekit-plugins-openai>=0.10.0 # Required for local vLLM/Whisper/Kokoro compatibility
-
-# HTTP clients
-httpx>=0.27.0
-aiohttp>=3.9.0
-
-# OpenAI SDK for local vLLM compatibility
-openai>=1.60.0
-
-# Audio processing
-numpy>=1.26.0
-sounddevice>=0.5.0
-pydub>=0.25.0
-
-# Environment and configuration
-python-dotenv>=1.0.0
-pydantic>=2.0.0
-
-# Health checks
-requests>=2.31.0
-
-# Local model integration
-# transformers>=4.39.0 # Not needed - using vLLM endpoints
-# torch>=2.2.0 # Not needed - using vLLM endpoints
-
-# Logging
-structlog>=24.1.0
-
-# API server for health checks
-fastapi>=0.109.0
-uvicorn>=0.27.0
\ No newline at end of file
diff --git a/dream-server/agents/voice/Dockerfile b/dream-server/agents/voice/Dockerfile
deleted file mode 100644
index a2cd82c18..000000000
--- a/dream-server/agents/voice/Dockerfile
+++ /dev/null
@@ -1,36 +0,0 @@
-# Dream Server Voice Agent
-# Real-time voice chat using LiveKit + local LLM
-#
-# Build: docker build -t dream-voice-agent .
-# Run: docker run -e LLM_URL=... -e STT_URL=... dream-voice-agent
-
-FROM python:3.11-slim
-
-WORKDIR /app
-
-# Install system deps (portaudio for audio, ffmpeg for transcoding)
-RUN apt-get update && apt-get install -y --no-install-recommends \
- gcc \
- libffi-dev \
- libportaudio2 \
- libportaudiocpp0 \
- portaudio19-dev \
- ffmpeg \
- curl \
- && rm -rf /var/lib/apt/lists/*
-
-# Install Python deps
-COPY requirements.txt .
-RUN pip install --no-cache-dir -r requirements.txt
-
-# Copy agent code
-COPY agent.py .
-COPY entrypoint.sh .
-RUN chmod +x entrypoint.sh
-
-# Non-root user for security
-RUN useradd -m -u 1000 agent && chown -R agent:agent /app
-USER agent
-
-# Run the agent
-CMD ["./entrypoint.sh"]
diff --git a/dream-server/agents/voice/README.md b/dream-server/agents/voice/README.md
deleted file mode 100644
index f42cb9594..000000000
--- a/dream-server/agents/voice/README.md
+++ /dev/null
@@ -1,84 +0,0 @@
-# Dream Server Voice Agent
-
-Real-time voice AI assistant running entirely on local hardware.
-
-## Architecture
-
-```
-User (WebRTC) โ LiveKit Server โ Voice Agent
- โ
- โโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโ
- โ โ โ
- Whisper STT vLLM (LLM) OpenTTS/Piper
- (port 9000) (port 8000) (port 8880)
-```
-
-## Status
-
-**Current:** โ ๏ธ Plugin interface WIP
-
-The LiveKit Agents SDK uses a plugin architecture. Our local backends need to implement the correct interfaces:
-
-| Component | Local Service | Status |
-|-----------|---------------|--------|
-| LLM | vLLM (OpenAI-compatible) | โ Works via `livekit-plugins-openai` |
-| STT | Whisper | ๐ก Needs OpenAI-compatible endpoint or custom plugin |
-| TTS | OpenTTS/Piper | ๐ก Needs custom plugin |
-| VAD | Silero | โ Works |
-
-## Requirements
-
-- LiveKit Server running (port 7880)
-- vLLM with OpenAI-compatible API (port 8000)
-- Whisper STT server (port 9000)
-- TTS server (port 8880)
-
-## Environment Variables
-
-```bash
-LIVEKIT_URL=ws://localhost:7880
-LIVEKIT_API_KEY= # Generated by install.sh
-LIVEKIT_API_SECRET= # Generated by install.sh
-LLM_URL=http://vllm:8000/v1
-LLM_MODEL=Qwen/Qwen2.5-32B-Instruct-AWQ
-STT_URL=http://whisper:9000
-TTS_URL=http://tts:8880
-```
-
-## Running
-
-```bash
-# Development mode
-python agent.py dev
-
-# Console mode (terminal only)
-python agent.py console
-
-# Production mode
-python agent.py start
-```
-
-## TODO for Full Integration
-
-1. **STT Plugin**: Either:
- - Use `faster-whisper-server` which has OpenAI-compatible API
- - Create custom LiveKit plugin for Whisper HTTP API
-
-2. **TTS Plugin**: Create custom plugin for OpenTTS/Piper HTTP API
-
-3. **Testing**: Integration test with all local services
-
-## Bootstrap Mode
-
-When using a small model (1.5B, 3B), the agent automatically:
-- Uses shorter system prompt
-- Limits response length
-- Faster but less capable
-
-This allows immediate voice interaction while the full model downloads.
-
-## References
-
-- [LiveKit Agents Docs](https://docs.livekit.io/agents/)
-- [LiveKit Plugins](https://docs.livekit.io/agents/models/#plugins)
-- [Dream Server Roadmap](../docs/TECHNICAL-ROADMAP.md)
diff --git a/dream-server/agents/voice/agent.py b/dream-server/agents/voice/agent.py
deleted file mode 100644
index 7c1d22df9..000000000
--- a/dream-server/agents/voice/agent.py
+++ /dev/null
@@ -1,324 +0,0 @@
-"""
-Dream Server Voice Agent (v3.1)
-Real-time voice conversation using local LLM + STT + TTS
-
-Uses LiveKit Agents SDK v1.4+ with local model backends:
-- LLM: vLLM (OpenAI-compatible)
-- STT: Whisper (OpenAI-compatible API)
-- TTS: Kokoro (OpenAI-compatible API)
-- VAD: Silero (built-in)
-
-Features:
-- Error handling with graceful degradation
-- Service health checks before startup
-- Reconnection logic for LiveKit
-- Interrupt handling (user can stop bot speech)
-"""
-
-import logging
-import os
-import asyncio
-import signal
-from typing import Optional
-
-from dotenv import load_dotenv
-from livekit.agents import (
- JobContext,
- JobProcess,
- WorkerOptions,
- cli,
-)
-from livekit.agents import Agent, AgentSession
-from livekit.plugins import silero, openai as openai_plugin
-
-# Load environment
-load_dotenv()
-
-# Configure logging
-logging.basicConfig(
- level=logging.INFO,
- format='%(asctime)s | %(name)s | %(levelname)s | %(message)s'
-)
-logger = logging.getLogger("dream-voice")
-
-# Environment config
-LIVEKIT_URL = os.getenv("LIVEKIT_URL", "ws://localhost:7880")
-LLM_URL = os.getenv("LLM_URL", "http://localhost:8000/v1")
-LLM_MODEL = os.getenv("LLM_MODEL", "Qwen/Qwen2.5-32B-Instruct-AWQ")
-STT_URL = os.getenv("STT_URL", "http://localhost:9000")
-TTS_URL = os.getenv("TTS_URL", "http://localhost:8880/v1")
-TTS_VOICE = os.getenv("TTS_VOICE", "af_heart")
-
-# Feature flags for graceful degradation
-ENABLE_STT = os.getenv("ENABLE_STT", "true").lower() == "true"
-ENABLE_TTS = os.getenv("ENABLE_TTS", "true").lower() == "true"
-ENABLE_INTERRUPTIONS = os.getenv("ENABLE_INTERRUPTIONS", "true").lower() == "true"
-
-
-async def check_service_health(url: str, name: str, timeout: int = 5) -> bool:
- """Check if a service is healthy before starting."""
- import aiohttp
- try:
- async with aiohttp.ClientSession() as session:
- async with session.get(url, timeout=aiohttp.ClientTimeout(total=timeout)) as resp:
- healthy = resp.status == 200
- if healthy:
- logger.info(f"โ {name} is healthy")
- else:
- logger.warning(f"โ {name} returned status {resp.status}")
- return healthy
- except Exception as e:
- logger.warning(f"โ {name} unreachable: {e}")
- return False
-
-
-class DreamVoiceAgent(Agent):
- """
- Voice agent with robust error handling and graceful degradation.
-
- Features:
- - Greets user on entry
- - Handles interruptions (user can stop bot speech)
- - Falls back gracefully if services fail
- """
-
- def __init__(self) -> None:
- super().__init__(
- instructions="""You are a helpful voice assistant running on local hardware.
-You have access to a powerful GPU cluster running Qwen2.5 32B for language understanding.
-Keep responses conversational and concise - this is voice, not text.
-Be friendly, direct, and helpful.""",
- # Enable interruption handling
- allow_interruptions=ENABLE_INTERRUPTIONS,
- )
- self.error_count = 0
- self.max_errors = 3
-
- async def on_enter(self):
- """Called when agent becomes active. Send greeting."""
- logger.info("Agent entered - sending greeting")
- try:
- self.session.generate_reply(
- instructions="Greet the user warmly and briefly introduce yourself as their local voice assistant."
- )
- except Exception as e:
- logger.error(f"Failed to send greeting: {e}")
- self.error_count += 1
-
- async def on_exit(self):
- """Called when agent is shutting down."""
- logger.info("Agent exiting - cleanup")
-
- async def on_error(self, error: Exception):
- """Handle errors gracefully."""
- self.error_count += 1
- logger.error(f"Agent error ({self.error_count}/{self.max_errors}): {error}")
-
- if self.error_count >= self.max_errors:
- logger.critical("Max errors reached, agent will restart")
- # Signal for restart
- raise error
-
-
-async def create_llm() -> Optional[openai_plugin.LLM]:
- """Create LLM with error handling."""
- try:
- llm = openai_plugin.LLM(
- model=LLM_MODEL,
- base_url=LLM_URL,
- api_key=os.environ.get("VLLM_API_KEY", ""),
- )
- logger.info(f"โ LLM configured: {LLM_MODEL}")
- return llm
- except Exception as e:
- logger.error(f"โ Failed to create LLM: {e}")
- return None
-
-
-async def create_stt() -> Optional[openai_plugin.STT]:
- """Create STT with error handling."""
- if not ENABLE_STT:
- logger.info("STT disabled by configuration")
- return None
-
- try:
- # Strip /v1 suffix if present before appending /health
- stt_base_url = STT_URL.removesuffix('/v1').removesuffix('/')
- # Check service health first
- healthy = await check_service_health(f"{stt_base_url}/", "STT (Whisper)")
- if not healthy:
- logger.warning("STT service not healthy, continuing without speech recognition")
- return None
-
- stt = openai_plugin.STT(
- model="whisper-1",
- base_url=STT_URL,
- api_key=os.environ.get("WHISPER_API_KEY", ""),
- )
- logger.info("โ STT configured")
- return stt
- except Exception as e:
- logger.error(f"โ Failed to create STT: {e}")
- logger.warning("Continuing without speech recognition")
- return None
-
-
-async def create_tts() -> Optional[openai_plugin.TTS]:
- """Create TTS with error handling."""
- if not ENABLE_TTS:
- logger.info("TTS disabled by configuration")
- return None
-
- try:
- # Check service health first (TTS_URL already includes /v1)
- tts_base_url = TTS_URL.removesuffix('/v1').removesuffix('/')
- healthy = await check_service_health(f"{tts_base_url}/health", "TTS (Kokoro)")
- if not healthy:
- logger.warning("TTS service not healthy, continuing without speech synthesis")
- return None
-
- tts = openai_plugin.TTS(
- model="kokoro",
- voice=TTS_VOICE,
- base_url=TTS_URL,
- api_key=os.environ.get("KOKORO_API_KEY", ""),
- )
- logger.info(f"โ TTS configured with voice: {TTS_VOICE}")
- return tts
- except Exception as e:
- logger.error(f"โ Failed to create TTS: {e}")
- logger.warning("Continuing without speech synthesis")
- return None
-
-
-async def entrypoint(ctx: JobContext):
- """
- Main entry point for the voice agent job.
-
- Includes:
- - Service health checks
- - Graceful degradation if services fail
- - Reconnection logic
- """
- logger.info(f"Voice agent connecting to room: {ctx.room.name}")
-
- # Health check phase
- logger.info("Performing service health checks...")
- # vLLM uses /v1/models for health check, not /health
- # LLM_URL already ends with /v1, so just add /models
- llm_healthy = await check_service_health(f"{LLM_URL}/models", "LLM (vLLM)")
-
- if not llm_healthy:
- logger.error("LLM service not healthy - cannot start agent")
- raise RuntimeError("LLM service required but not available")
-
- # Create components with error handling
- llm = await create_llm()
- if not llm:
- raise RuntimeError("Failed to create LLM - agent cannot start")
-
- stt = await create_stt()
- tts = await create_tts()
-
- # Create VAD from prewarmed cache or load fresh
- try:
- vad = ctx.proc.userdata.get("vad") or silero.VAD.load()
- logger.info("โ VAD loaded")
- except Exception as e:
- logger.error(f"โ Failed to load VAD: {e}")
- logger.warning("Starting without voice activity detection")
- vad = None
-
- # Create session - only include working components
- session_kwargs = {"llm": llm}
- if stt:
- session_kwargs["stt"] = stt
- if tts:
- session_kwargs["tts"] = tts
- if vad:
- session_kwargs["vad"] = vad
-
- session = AgentSession(**session_kwargs)
-
- # Create agent
- agent = DreamVoiceAgent()
-
- # Setup graceful shutdown
- shutdown_event = asyncio.Event()
-
- def signal_handler(sig, frame):
- logger.info("Shutdown signal received")
- shutdown_event.set()
-
- signal.signal(signal.SIGTERM, signal_handler)
- signal.signal(signal.SIGINT, signal_handler)
-
- # Connect to room first (required by LiveKit SDK)
- max_retries = 3
- for attempt in range(max_retries):
- try:
- await ctx.connect()
- logger.info("Connected to room")
- break
- except Exception as e:
- logger.error(f"Room connection failed (attempt {attempt + 1}/{max_retries}): {e}")
- if attempt == max_retries - 1:
- raise
- await asyncio.sleep(1)
-
- # Start session after room connection
- for attempt in range(max_retries):
- try:
- await session.start(agent=agent, room=ctx.room)
- logger.info("Voice agent session started")
- break
- except Exception as e:
- logger.error(f"Session start failed (attempt {attempt + 1}/{max_retries}): {e}")
- if attempt == max_retries - 1:
- raise
- await asyncio.sleep(1)
-
- # Wait for shutdown signal
- try:
- await shutdown_event.wait()
- except asyncio.CancelledError:
- logger.info("Agent task cancelled")
- finally:
- logger.info("Shutting down voice agent...")
- try:
- await session.close()
- except Exception as e:
- logger.error(f"Error during shutdown: {e}")
-
-
-def prewarm(proc: JobProcess):
- """Prewarm function - load models before first job."""
- logger.info("Prewarming voice agent...")
- try:
- proc.userdata["vad"] = silero.VAD.load()
- logger.info("โ VAD model loaded")
- except Exception as e:
- logger.error(f"โ Failed to load VAD: {e}")
- proc.userdata["vad"] = None
-
-
-if __name__ == "__main__":
- agent_port = int(os.getenv("AGENT_PORT", "8181"))
-
- # Log startup info
- logger.info("=" * 60)
- logger.info("Dream Server Voice Agent Starting")
- logger.info(f"Port: {agent_port}")
- logger.info(f"LLM: {LLM_URL}")
- logger.info(f"STT: {STT_URL} (enabled: {ENABLE_STT})")
- logger.info(f"TTS: {TTS_URL} (enabled: {ENABLE_TTS})")
- logger.info(f"Interruptions: {ENABLE_INTERRUPTIONS}")
- logger.info("=" * 60)
-
- cli.run_app(
- WorkerOptions(
- entrypoint_fnc=entrypoint,
- prewarm_fnc=prewarm,
- port=agent_port,
- )
- )
diff --git a/dream-server/agents/voice/entrypoint.sh b/dream-server/agents/voice/entrypoint.sh
deleted file mode 100755
index ce10b151e..000000000
--- a/dream-server/agents/voice/entrypoint.sh
+++ /dev/null
@@ -1,61 +0,0 @@
-#!/bin/bash
-# Voice Agent Entrypoint
-set -euo pipefail
-
-echo "========================================"
-echo " Dream Server Voice Agent"
-echo "========================================"
-echo ""
-echo "Configuration:"
-echo " LLM URL: ${LLM_URL:-http://vllm:8000/v1}"
-echo " STT URL: ${STT_URL:-http://localhost:9000}"
-echo " TTS URL: ${TTS_URL:-http://localhost:8880}"
-echo ""
-
-# Health check function
-wait_for_service() {
- local name=$1
- local url=$2
- local max_attempts=${3:-30}
- local attempt=1
-
- echo "Waiting for $name at $url..."
- while [ $attempt -le $max_attempts ]; do
- if curl -sf --connect-timeout 10 --max-time 30 "$url" > /dev/null 2>&1; then
- echo "โ $name is ready"
- return 0
- fi
- echo " Attempt $attempt/$max_attempts - $name not ready yet..."
- sleep 2
- attempt=$((attempt + 1))
- done
-
- echo "โ $name failed to respond after $max_attempts attempts"
- return 1
-}
-
-# Wait for required services
-echo "Checking service dependencies..."
-# Extract base URL for health check (remove /v1 suffix)
-LLM_BASE_URL="${LLM_URL:-http://vllm:8000/v1}"
-LLM_BASE_URL="${LLM_BASE_URL%/v1}"
-# vLLM uses /v1/models as health indicator, not /health
-wait_for_service "LLM (vLLM)" "${LLM_BASE_URL}/v1/models" 60 || echo "Warning: LLM health check failed, continuing anyway..."
-STT_BASE_URL="${STT_URL:-http://whisper:9000/v1}"
-STT_BASE_URL="${STT_BASE_URL%/v1}"
-# Whisper health check - try /health or just check if port is open
-wait_for_service "STT (Whisper)" "${STT_BASE_URL}/" 10 || echo "Warning: STT health check failed, continuing anyway..."
-
-# TTS is optional for some configs
-if [ -n "${TTS_URL:-}" ]; then
- # Extract base URL for health check (remove /v1 suffix if present)
- TTS_BASE_URL="${TTS_URL%/v1}"
- wait_for_service "TTS" "${TTS_BASE_URL}/health" 5 || echo "Warning: TTS not available, continuing anyway..."
-fi
-
-echo ""
-echo "All services ready. Starting voice agent..."
-echo ""
-
-# Start the voice agent
-exec python agent.py start
diff --git a/dream-server/agents/voice/requirements.txt b/dream-server/agents/voice/requirements.txt
deleted file mode 100644
index 6710b33fb..000000000
--- a/dream-server/agents/voice/requirements.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-# Dream Server Voice Agent Dependencies
-# Pinned for reproducibility โ update periodically
-#
-# Versions verified 2026-02-10
-
-# LiveKit core
-livekit>=0.17.0
-livekit-agents>=1.0.0
-livekit-plugins-silero>=0.8.0
-livekit-plugins-openai>=0.10.0
-
-# HTTP clients
-httpx>=0.27.0
-aiohttp>=3.9.0
-
-# OpenAI SDK (for vLLM compatibility)
-openai>=1.60.0
-
-# Audio processing
-numpy>=1.26.0
-sounddevice>=0.5.0
-pydub>=0.25.0
-
-# Environment
-python-dotenv>=1.0.0
-
-# API server (for test endpoints)
-fastapi>=0.109.0
-uvicorn>=0.27.0
-pydantic>=2.0.0
diff --git a/dream-server/agents/voice/test_server.py b/dream-server/agents/voice/test_server.py
deleted file mode 100644
index f9f3f35d1..000000000
--- a/dream-server/agents/voice/test_server.py
+++ /dev/null
@@ -1,175 +0,0 @@
-"""
-M4 Voice Agent Test Server
-
-Provides HTTP endpoints for testing the deterministic layer
-without requiring browser/voice interaction.
-
-Usage:
- python test_server.py
-
-Endpoints:
- POST /test/utterance - Test intent classification + FSM routing
- GET /metrics - Get deterministic routing metrics
- GET /health - Health check
-"""
-
-import os
-import sys
-import json
-import time
-from typing import Dict, Any
-from fastapi import FastAPI
-from pydantic import BaseModel
-import uvicorn
-
-# Add deterministic module to path
-sys.path.insert(0, os.path.dirname(__file__))
-from deterministic import (
- QwenClassifier,
- LiveKitFSMAdapter,
- FSMExecutor,
-)
-from deterministic.extractors import DEFAULT_EXTRACTORS
-
-app = FastAPI(title="M4 Voice Agent Test Server")
-
-# Global state
-clf = None
-adapter = None
-fsm = None
-
-class UtteranceRequest(BaseModel):
- utterance: str
- session_id: str = None
- flow_name: str = "hvac_service"
-
-class TestResponse(BaseModel):
- intent: str
- confidence: float
- deterministic: bool
- response: str
- latency_ms: float
- flow_active: bool
-
-@app.on_event("startup")
-async def startup():
- """Initialize M4 components."""
- global clf, adapter, fsm
-
- print("Initializing M4 Deterministic Layer...")
-
- # Initialize classifier
- clf = QwenClassifier(
- base_url=os.getenv("LLM_URL", "http://localhost:8000/v1"),
- model=os.getenv("LLM_MODEL", "Qwen/Qwen2.5-32B-Instruct-AWQ"),
- threshold=float(os.getenv("DETERMINISTIC_THRESHOLD", "0.85"))
- )
-
- # Initialize FSM with flows
- fsm = FSMExecutor(extractors=DEFAULT_EXTRACTORS)
- flows_dir = os.getenv("FLOWS_DIR", "./flows")
- if os.path.exists(flows_dir):
- # Load flows manually to handle "domain" vs "name" field
- import glob
- for flow_file in glob.glob(os.path.join(flows_dir, "*.json")):
- with open(flow_file) as f:
- flow = json.load(f)
- # Normalize: use "domain" as "name" if present
- flow_name = flow.get("name") or flow.get("domain")
- if flow_name:
- flow["name"] = flow_name
- fsm.flows[flow_name] = flow
- print(f"Loaded {len(fsm.flows)} flows from {flows_dir}")
- else:
- print(f"Warning: Flows directory not found: {flows_dir}")
-
- # Initialize adapter
- adapter = LiveKitFSMAdapter(
- fsm=fsm,
- classifier=clf,
- confidence_threshold=0.85,
- entity_extractors=DEFAULT_EXTRACTORS
- )
-
- print("M4 Test Server ready!")
-
-@app.get("/health")
-def health():
- return {
- "status": "healthy",
- "m4_enabled": clf is not None,
- "flows_loaded": len(fsm.flows) if fsm else 0
- }
-
-@app.post("/test/utterance", response_model=TestResponse)
-async def test_utterance(req: UtteranceRequest):
- """Test a single utterance through M4 pipeline."""
- session_id = req.session_id or f"test-{int(time.time())}"
-
- # Start session if new
- if session_id not in adapter.active_sessions:
- await adapter.start_session(session_id, req.flow_name)
-
- # Process utterance
- start = time.time()
- result = await adapter.handle_utterance(session_id, req.utterance)
- latency = (time.time() - start) * 1000
-
- return TestResponse(
- intent=result.intent,
- confidence=result.confidence,
- deterministic=result.used_deterministic,
- response=result.text,
- latency_ms=result.latency_ms or latency,
- flow_active=result.flow_status == "in_progress" if result.flow_status else False
- )
-
-@app.post("/test/flow")
-async def test_flow(req: UtteranceRequest):
- """Test a complete flow with multiple utterances."""
- session_id = req.session_id or f"test-{int(time.time())}"
-
- # Define test sequence
- test_utterances = [
- "schedule a service",
- "my name is Todd",
- "tomorrow at 2pm",
- "yes confirm"
- ]
-
- results = []
- await adapter.start_session(session_id, req.flow_name)
-
- for utterance in test_utterances:
- start = time.time()
- result = await adapter.handle_utterance(session_id, utterance)
- latency = (time.time() - start) * 1000
-
- results.append({
- "utterance": utterance,
- "intent": result.intent,
- "confidence": result.confidence,
- "deterministic": result.used_deterministic,
- "response": result.text,
- "latency_ms": result.latency_ms or latency
- })
-
- # Get metrics
- metrics = adapter.get_metrics()
-
- return {
- "session_id": session_id,
- "flow_name": req.flow_name,
- "results": results,
- "metrics": metrics
- }
-
-@app.get("/metrics")
-def get_metrics():
- """Get M4 routing metrics."""
- if adapter:
- return adapter.get_metrics()
- return {"error": "Adapter not initialized"}
-
-if __name__ == "__main__":
- uvicorn.run(app, host="0.0.0.0", port=8290)
diff --git a/dream-server/compose/docker-compose.cluster.yml b/dream-server/compose/docker-compose.cluster.yml
deleted file mode 100644
index ba31cbe54..000000000
--- a/dream-server/compose/docker-compose.cluster.yml
+++ /dev/null
@@ -1,270 +0,0 @@
-# Dream Server โ Cluster Tier
-# Multi-GPU (48GB+ total VRAM) โ 70B+ models with tensor parallelism
-# Usage: docker compose -f docker-compose.cluster.yml up -d
-#
-# Requirements:
-# - 2+ NVIDIA GPUs with 24GB+ each, or 4+ GPUs with 16GB+ each
-# - NVLink/NVSwitch recommended for optimal tensor parallelism
-# - 64GB+ system RAM recommended
-#
-# Capacity estimate (2x A100 80GB):
-# - 100+ concurrent LLM requests at <100ms latency
-# - 20+ concurrent voice conversations
-# - 72B model with 32K context
-
-services:
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- # LLM โ Qwen2.5-72B with Tensor Parallelism
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- vllm:
- image: vllm/vllm-openai:v0.15.1
- runtime: nvidia
- container_name: dream-vllm-cluster
- environment:
- - NVIDIA_VISIBLE_DEVICES=all
- - VLLM_ATTENTION_BACKEND=FLASHINFER
- - NCCL_DEBUG=WARN
- volumes:
- - ${HF_HOME:-~/.cache/huggingface}:/root/.cache/huggingface
- ports:
- - "8000:8000"
- command: >
- --model Qwen/Qwen2.5-72B-Instruct-AWQ
- ${VLLM_QUANTIZATION:+--quantization $VLLM_QUANTIZATION}
- --tensor-parallel-size ${VLLM_TP_SIZE:-2}
- --max-model-len 32768
- --gpu-memory-utilization 0.92
- --enable-auto-tool-choice
- --tool-call-parser hermes
- --served-model-name gpt-4o
- --trust-remote-code
- --disable-log-requests
- deploy:
- resources:
- reservations:
- devices:
- - driver: nvidia
- count: all
- capabilities: [gpu]
- healthcheck:
- test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
- interval: 30s
- timeout: 10s
- retries: 5
- start_period: 600s # 72B takes longer to load
- restart: unless-stopped
- ulimits:
- memlock: -1
- stack: 67108864
-
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- # STT โ Whisper Large v3 Turbo (GPU)
- # Dedicated GPU for STT to avoid contention with LLM
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- whisper:
- image: fedirz/faster-whisper-server:latest-cuda
- runtime: nvidia
- container_name: dream-whisper-cluster
- environment:
- - WHISPER__MODEL=Systran/faster-whisper-large-v3-turbo
- - WHISPER__DEVICE=cuda
- - WHISPER__COMPUTE_TYPE=float16
- - WHISPER__NUM_WORKERS=4
- - CUDA_VISIBLE_DEVICES=${WHISPER_GPU:-0}
- ports:
- - "8001:8000"
- deploy:
- resources:
- reservations:
- devices:
- - driver: nvidia
- device_ids: ["${WHISPER_GPU:-0}"]
- capabilities: [gpu]
- healthcheck:
- test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
- interval: 30s
- timeout: 10s
- retries: 3
- restart: unless-stopped
-
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- # TTS โ Kokoro GPU (batch synthesis for high throughput)
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- kokoro:
- image: ghcr.io/remsky/kokoro-fastapi:v0.6.2-gpu
- runtime: nvidia
- container_name: dream-kokoro-cluster
- environment:
- - CUDA_VISIBLE_DEVICES=${KOKORO_GPU:-0}
- - KOKORO_BATCH_SIZE=8
- ports:
- - "8880:8880"
- volumes:
- - kokoro-cache:/app/cache
- deploy:
- resources:
- reservations:
- devices:
- - driver: nvidia
- device_ids: ["${KOKORO_GPU:-0}"]
- capabilities: [gpu]
- healthcheck:
- test: ["CMD", "curl", "-f", "http://localhost:8880/health"]
- interval: 30s
- timeout: 10s
- retries: 3
- restart: unless-stopped
-
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- # LiveKit โ WebRTC Server (production config)
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- livekit:
- image: livekit/livekit-server:latest
- container_name: dream-livekit-cluster
- ports:
- - "7880:7880" # HTTP API
- - "7881:7881" # WebRTC TCP
- - "7882:7882/udp" # WebRTC UDP
- - "50000-50100:50000-50100/udp" # RTP ports for high concurrency
- command: >
- --config /livekit.yaml
- volumes:
- - ./livekit-cluster.yaml:/livekit.yaml:ro
- healthcheck:
- test: ["CMD", "wget", "--spider", "-q", "http://localhost:7880"]
- interval: 10s
- timeout: 5s
- retries: 3
- restart: unless-stopped
-
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- # Voice Agent โ High-concurrency configuration
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- voice-agent:
- build:
- context: ./agents/voice
- dockerfile: Dockerfile
- container_name: dream-voice-agent-cluster
- environment:
- - LIVEKIT_URL=ws://livekit:7880
- - LIVEKIT_API_KEY=${LIVEKIT_API_KEY:?LIVEKIT_API_KEY must be set}
- - LIVEKIT_API_SECRET=${LIVEKIT_API_SECRET:?LIVEKIT_API_SECRET must be set}
- - LLM_BASE_URL=http://vllm:8000/v1
- - STT_BASE_URL=http://whisper:8000
- - TTS_BASE_URL=http://kokoro:8880
- - AGENT_CONCURRENCY=20
- depends_on:
- vllm:
- condition: service_healthy
- whisper:
- condition: service_healthy
- kokoro:
- condition: service_healthy
- livekit:
- condition: service_healthy
- deploy:
- replicas: 2 # Multiple agent instances for high concurrency
- restart: unless-stopped
-
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- # Dashboard โ Web UI
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- dashboard:
- build:
- context: ./dashboard
- dockerfile: Dockerfile
- container_name: dream-dashboard-cluster
- ports:
- - "3001:3001"
- environment:
- - VITE_API_URL=http://localhost:3002
- - VITE_LIVEKIT_URL=ws://localhost:7880
- depends_on:
- - api
- restart: unless-stopped
-
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- # API โ Backend for Dashboard
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- api:
- build:
- context: ./dashboard-api
- dockerfile: Dockerfile
- container_name: dream-api-cluster
- ports:
- - "3002:3002"
- environment:
- - VLLM_URL=http://vllm:8000
- - WHISPER_URL=http://whisper:8000
- - KOKORO_URL=http://kokoro:8880
- - LIVEKIT_URL=ws://livekit:7880
- - LIVEKIT_API_KEY=${LIVEKIT_API_KEY:?LIVEKIT_API_KEY must be set}
- - LIVEKIT_API_SECRET=${LIVEKIT_API_SECRET:?LIVEKIT_API_SECRET must be set}
- depends_on:
- - vllm
- restart: unless-stopped
-
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- # Metrics โ Prometheus + Grafana for cluster monitoring
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- prometheus:
- image: prom/prometheus:latest
- container_name: dream-prometheus
- ports:
- - "9090:9090"
- extra_hosts:
- - "host.docker.internal:host-gateway"
- volumes:
- - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
- - prometheus-data:/prometheus
- command:
- - '--config.file=/etc/prometheus/prometheus.yml'
- - '--storage.tsdb.retention.time=7d'
- restart: unless-stopped
-
- grafana:
- image: grafana/grafana:latest
- container_name: dream-grafana
- ports:
- - "${GRAFANA_PORT:-3003}:3000"
- environment:
- - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD:?GRAFANA_PASSWORD must be set in .env}
- - GF_USERS_ALLOW_SIGN_UP=false
- volumes:
- - grafana-data:/var/lib/grafana
- - ./grafana/dashboards:/etc/grafana/provisioning/dashboards:ro
- - ./grafana/datasources:/etc/grafana/provisioning/datasources:ro
- depends_on:
- - prometheus
- restart: unless-stopped
-
-volumes:
- kokoro-cache:
- prometheus-data:
- grafana-data:
-
-# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
-# Configuration Notes:
-# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
-#
-# Environment Variables:
-# VLLM_TP_SIZE - Tensor parallel size (default: 2, set to GPU count)
-# WHISPER_GPU - GPU device ID for Whisper (default: 0)
-# KOKORO_GPU - GPU device ID for Kokoro (default: 0)
-# LIVEKIT_API_KEY - LiveKit API key (default: devkey)
-# LIVEKIT_API_SECRET - LiveKit API secret (default: secret)
-# GRAFANA_PASSWORD - Grafana admin password (default: admin)
-#
-# Recommended GPU Allocation (4x GPU setup):
-# GPU 0-1: vLLM (tensor parallel)
-# GPU 2: Whisper STT
-# GPU 3: Kokoro TTS
-#
-# For 2x GPU setup:
-# GPU 0-1: vLLM (tensor parallel)
-# GPU 0: Whisper + Kokoro (shared, time-sliced)
-#
-# Scaling:
-# - Adjust VLLM_TP_SIZE to match available GPUs
-# - For more concurrent voice, add voice-agent replicas
-# - Monitor with Grafana at :3000
diff --git a/dream-server/compose/docker-compose.edge.yml b/dream-server/compose/docker-compose.edge.yml
deleted file mode 100644
index e3d2ea998..000000000
--- a/dream-server/compose/docker-compose.edge.yml
+++ /dev/null
@@ -1,170 +0,0 @@
-# Dream Server โ Edge Tier
-# 16GB RAM or 8GB+ VRAM โ 7-8B models, full voice stack
-# Usage: docker compose -f docker-compose.edge.yml up -d
-
-services:
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- # LLM โ Qwen2.5-7B (fits in 8GB VRAM with AWQ)
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- vllm:
- image: vllm/vllm-openai:v0.15.1
- runtime: nvidia
- container_name: dream-vllm
- environment:
- - NVIDIA_VISIBLE_DEVICES=all
- volumes:
- - ${HF_HOME:-~/.cache/huggingface}:/root/.cache/huggingface
- ports:
- - "8000:8000"
- command: >
- --model Qwen/Qwen2.5-7B-Instruct-AWQ
- ${VLLM_QUANTIZATION:+--quantization $VLLM_QUANTIZATION}
- --max-model-len 16384
- --gpu-memory-utilization 0.85
- --served-model-name gpt-4o
- --trust-remote-code
- deploy:
- resources:
- reservations:
- devices:
- - driver: nvidia
- count: 1
- capabilities: [gpu]
- healthcheck:
- test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
- interval: 30s
- timeout: 10s
- retries: 5
- start_period: 180s
- restart: unless-stopped
-
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- # STT โ Whisper Medium (balances quality vs VRAM)
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- whisper:
- image: fedirz/faster-whisper-server:latest-cuda
- runtime: nvidia
- container_name: dream-whisper
- environment:
- - WHISPER__MODEL=Systran/faster-whisper-medium
- - WHISPER__DEVICE=cuda
- - NVIDIA_VISIBLE_DEVICES=all
- ports:
- - "8001:8000"
- deploy:
- resources:
- reservations:
- devices:
- - driver: nvidia
- count: 1
- capabilities: [gpu]
- healthcheck:
- test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
- interval: 30s
- timeout: 10s
- retries: 3
- restart: unless-stopped
-
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- # TTS โ Kokoro CPU (saves VRAM for LLM)
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- kokoro:
- image: ghcr.io/remsky/kokoro-fastapi:v0.6.2-cpu
- container_name: dream-kokoro
- ports:
- - "8880:8880"
- volumes:
- - kokoro-cache:/app/cache
- healthcheck:
- test: ["CMD", "curl", "-f", "http://localhost:8880/health"]
- interval: 30s
- timeout: 10s
- retries: 3
- restart: unless-stopped
-
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- # LiveKit โ WebRTC Server
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- livekit:
- image: livekit/livekit-server:latest
- container_name: dream-livekit
- ports:
- - "7880:7880"
- - "7881:7881"
- - "7882:7882/udp"
- command: --config /livekit.yaml
- environment:
- # Keys passed via env var (safer than config file)
- - LIVEKIT_KEYS=${LIVEKIT_API_KEY}:${LIVEKIT_API_SECRET}
- volumes:
- - ./livekit.yaml:/livekit.yaml:ro
- healthcheck:
- test: ["CMD", "wget", "--spider", "-q", "http://localhost:7880"]
- interval: 10s
- timeout: 5s
- retries: 3
- restart: unless-stopped
-
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- # Voice Agent
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- voice-agent:
- build:
- context: ./agents/voice
- dockerfile: Dockerfile
- container_name: dream-voice-agent
- environment:
- - LIVEKIT_URL=ws://livekit:7880
- - LIVEKIT_API_KEY=${LIVEKIT_API_KEY}
- - LIVEKIT_API_SECRET=${LIVEKIT_API_SECRET}
- - LLM_BASE_URL=http://vllm:8000/v1
- - STT_BASE_URL=http://whisper:8000
- - TTS_BASE_URL=http://kokoro:8880
- depends_on:
- vllm:
- condition: service_healthy
- whisper:
- condition: service_healthy
- kokoro:
- condition: service_healthy
- livekit:
- condition: service_healthy
- restart: unless-stopped
-
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- # Dashboard + API
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- dashboard:
- build:
- context: ./dashboard
- dockerfile: Dockerfile
- container_name: dream-dashboard
- ports:
- - "3001:3001"
- environment:
- - VITE_API_URL=http://localhost:3002
- - VITE_LIVEKIT_URL=ws://localhost:7880
- depends_on:
- - api
- restart: unless-stopped
-
- api:
- build:
- context: ./dashboard-api
- dockerfile: Dockerfile
- container_name: dream-api
- ports:
- - "3002:3002"
- environment:
- - VLLM_URL=http://vllm:8000
- - WHISPER_URL=http://whisper:8000
- - KOKORO_URL=http://kokoro:8880
- - LIVEKIT_URL=ws://livekit:7880
- - LIVEKIT_API_KEY=${LIVEKIT_API_KEY}
- - LIVEKIT_API_SECRET=${LIVEKIT_API_SECRET}
- depends_on:
- - vllm
- restart: unless-stopped
-
-volumes:
- kokoro-cache:
diff --git a/dream-server/compose/docker-compose.nano.yml b/dream-server/compose/docker-compose.nano.yml
deleted file mode 100644
index 0310b1d23..000000000
--- a/dream-server/compose/docker-compose.nano.yml
+++ /dev/null
@@ -1,63 +0,0 @@
-# Dream Server โ Nano Tier
-# 8GB+ RAM, no GPU required โ 1-3B models, text-only
-# Usage: docker compose -f docker-compose.nano.yml up -d
-#
-# Note: Voice features disabled (no GPU for real-time STT/TTS)
-# Use text chat via API or dashboard
-
-services:
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- # LLM โ Qwen2.5-1.5B via llama.cpp (CPU)
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- llama:
- image: ghcr.io/ggerganov/llama.cpp:server
- container_name: dream-llama
- ports:
- - "8000:8080"
- volumes:
- - ${MODELS_DIR:-~/.cache/models}:/models
- command: >
- --model /models/qwen2.5-1.5b-instruct-q4_k_m.gguf
- --ctx-size 8192
- --n-gpu-layers 0
- --threads 4
- --host 0.0.0.0
- --port 8080
- healthcheck:
- test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
- interval: 30s
- timeout: 10s
- retries: 3
- start_period: 60s
- restart: unless-stopped
-
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- # Dashboard + API (no voice features)
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- dashboard:
- build:
- context: ./dashboard
- dockerfile: Dockerfile
- container_name: dream-dashboard
- ports:
- - "3001:3001"
- environment:
- - VITE_API_URL=http://localhost:3002
- - VITE_VOICE_ENABLED=false
- depends_on:
- - api
- restart: unless-stopped
-
- api:
- build:
- context: ./dashboard-api
- dockerfile: Dockerfile
- container_name: dream-api
- ports:
- - "3002:3002"
- environment:
- - LLM_URL=http://llama:8080
- - VOICE_ENABLED=false
- depends_on:
- - llama
- restart: unless-stopped
diff --git a/dream-server/compose/docker-compose.pro.yml b/dream-server/compose/docker-compose.pro.yml
deleted file mode 100644
index 80650e345..000000000
--- a/dream-server/compose/docker-compose.pro.yml
+++ /dev/null
@@ -1,184 +0,0 @@
-# Dream Server โ Pro Tier
-# 24GB+ VRAM โ 32B models, full voice stack
-# Usage: docker compose -f docker-compose.pro.yml up -d
-
-services:
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- # LLM โ Qwen2.5-32B-Instruct-AWQ
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- vllm:
- image: vllm/vllm-openai:v0.15.1
- runtime: nvidia
- container_name: dream-vllm
- environment:
- - NVIDIA_VISIBLE_DEVICES=all
- - VLLM_ATTENTION_BACKEND=FLASHINFER
- volumes:
- - ${HF_HOME:-~/.cache/huggingface}:/root/.cache/huggingface
- ports:
- - "8000:8000"
- command: >
- --model Qwen/Qwen2.5-32B-Instruct-AWQ
- ${VLLM_QUANTIZATION:+--quantization $VLLM_QUANTIZATION}
- --max-model-len 32768
- --gpu-memory-utilization 0.90
- --enable-auto-tool-choice
- --tool-call-parser hermes
- --served-model-name gpt-4o
- --trust-remote-code
- deploy:
- resources:
- reservations:
- devices:
- - driver: nvidia
- count: 1
- capabilities: [gpu]
- healthcheck:
- test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
- interval: 30s
- timeout: 10s
- retries: 5
- start_period: 300s
- restart: unless-stopped
-
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- # STT โ Whisper Large v3
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- whisper:
- image: fedirz/faster-whisper-server:latest-cuda
- runtime: nvidia
- container_name: dream-whisper
- environment:
- - WHISPER__MODEL=Systran/faster-whisper-large-v3
- - WHISPER__DEVICE=cuda
- - NVIDIA_VISIBLE_DEVICES=all
- ports:
- - "8001:8000"
- deploy:
- resources:
- reservations:
- devices:
- - driver: nvidia
- count: 1
- capabilities: [gpu]
- healthcheck:
- test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
- interval: 30s
- timeout: 10s
- retries: 3
- restart: unless-stopped
-
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- # TTS โ Kokoro (GPU-accelerated)
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- kokoro:
- image: ghcr.io/remsky/kokoro-fastapi:v0.6.2-gpu
- runtime: nvidia
- container_name: dream-kokoro
- environment:
- - NVIDIA_VISIBLE_DEVICES=all
- ports:
- - "8880:8880"
- volumes:
- - kokoro-cache:/app/cache
- deploy:
- resources:
- reservations:
- devices:
- - driver: nvidia
- count: 1
- capabilities: [gpu]
- healthcheck:
- test: ["CMD", "curl", "-f", "http://localhost:8880/health"]
- interval: 30s
- timeout: 10s
- retries: 3
- restart: unless-stopped
-
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- # LiveKit โ WebRTC Server
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- livekit:
- image: livekit/livekit-server:latest
- container_name: dream-livekit
- ports:
- - "7880:7880" # HTTP
- - "7881:7881" # WebRTC TCP
- - "7882:7882/udp" # WebRTC UDP
- command: >
- --config /livekit.yaml
- volumes:
- - ./livekit.yaml:/livekit.yaml:ro
- healthcheck:
- test: ["CMD", "wget", "--spider", "-q", "http://localhost:7880"]
- interval: 10s
- timeout: 5s
- retries: 3
- restart: unless-stopped
-
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- # Voice Agent โ Connects LLM + STT + TTS via LiveKit
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- voice-agent:
- build:
- context: ./agents/voice
- dockerfile: Dockerfile
- container_name: dream-voice-agent
- environment:
- - LIVEKIT_URL=ws://livekit:7880
- - LIVEKIT_API_KEY=${LIVEKIT_API_KEY:?LIVEKIT_API_KEY must be set}
- - LIVEKIT_API_SECRET=${LIVEKIT_API_SECRET:?LIVEKIT_API_SECRET must be set}
- - LLM_BASE_URL=http://vllm:8000/v1
- - STT_BASE_URL=http://whisper:8000
- - TTS_BASE_URL=http://kokoro:8880
- depends_on:
- vllm:
- condition: service_healthy
- whisper:
- condition: service_healthy
- kokoro:
- condition: service_healthy
- livekit:
- condition: service_healthy
- restart: unless-stopped
-
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- # Dashboard โ Web UI
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- dashboard:
- build:
- context: ./dashboard
- dockerfile: Dockerfile
- container_name: dream-dashboard
- ports:
- - "3001:3001"
- environment:
- - VITE_API_URL=http://localhost:3002
- - VITE_LIVEKIT_URL=ws://localhost:7880
- depends_on:
- - api
- restart: unless-stopped
-
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- # API โ Backend for Dashboard
- # โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- api:
- build:
- context: ./dashboard-api
- dockerfile: Dockerfile
- container_name: dream-api
- ports:
- - "3002:3002"
- environment:
- - VLLM_URL=http://vllm:8000
- - WHISPER_URL=http://whisper:8000
- - KOKORO_URL=http://kokoro:8880
- - LIVEKIT_URL=ws://livekit:7880
- - LIVEKIT_API_KEY=${LIVEKIT_API_KEY:?LIVEKIT_API_KEY must be set}
- - LIVEKIT_API_SECRET=${LIVEKIT_API_SECRET:?LIVEKIT_API_SECRET must be set}
- depends_on:
- - vllm
- restart: unless-stopped
-
-volumes:
- kokoro-cache:
diff --git a/dream-server/compose/grafana/dashboards/dashboard.yml b/dream-server/compose/grafana/dashboards/dashboard.yml
deleted file mode 100644
index 9a4e56eee..000000000
--- a/dream-server/compose/grafana/dashboards/dashboard.yml
+++ /dev/null
@@ -1,11 +0,0 @@
-apiVersion: 1
-
-providers:
- - name: 'Dream Server'
- orgId: 1
- folder: ''
- type: file
- disableDeletion: false
- editable: true
- options:
- path: /etc/grafana/provisioning/dashboards
diff --git a/dream-server/compose/grafana/dashboards/dream-server.json b/dream-server/compose/grafana/dashboards/dream-server.json
deleted file mode 100644
index 4ad72df2e..000000000
--- a/dream-server/compose/grafana/dashboards/dream-server.json
+++ /dev/null
@@ -1,580 +0,0 @@
-{
- "annotations": {
- "list": []
- },
- "editable": true,
- "fiscalYearStartMonth": 0,
- "graphTooltip": 0,
- "id": null,
- "links": [],
- "panels": [
- {
- "collapsed": false,
- "gridPos": { "h": 1, "w": 24, "x": 0, "y": 0 },
- "id": 100,
- "panels": [],
- "title": "vLLM Inference",
- "type": "row"
- },
- {
- "datasource": { "type": "prometheus", "uid": "prometheus" },
- "fieldConfig": {
- "defaults": {
- "color": { "mode": "palette-classic" },
- "custom": {
- "axisBorderShow": false,
- "axisCenteredZero": false,
- "axisColorMode": "text",
- "axisLabel": "",
- "axisPlacement": "auto",
- "barAlignment": 0,
- "drawStyle": "line",
- "fillOpacity": 10,
- "gradientMode": "none",
- "hideFrom": { "legend": false, "tooltip": false, "viz": false },
- "insertNulls": false,
- "lineInterpolation": "linear",
- "lineWidth": 1,
- "pointSize": 5,
- "scaleDistribution": { "type": "linear" },
- "showPoints": "never",
- "spanNulls": false,
- "stacking": { "group": "A", "mode": "none" },
- "thresholdsStyle": { "mode": "off" }
- },
- "mappings": [],
- "thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }] },
- "unit": "reqps"
- },
- "overrides": []
- },
- "gridPos": { "h": 8, "w": 12, "x": 0, "y": 1 },
- "id": 1,
- "options": {
- "legend": { "calcs": ["mean", "max"], "displayMode": "list", "placement": "bottom", "showLegend": true },
- "tooltip": { "mode": "multi", "sort": "none" }
- },
- "targets": [
- {
- "datasource": { "type": "prometheus", "uid": "prometheus" },
- "expr": "rate(vllm:num_requests_total[1m])",
- "legendFormat": "Requests/sec",
- "refId": "A"
- }
- ],
- "title": "Request Rate",
- "type": "timeseries"
- },
- {
- "datasource": { "type": "prometheus", "uid": "prometheus" },
- "fieldConfig": {
- "defaults": {
- "color": { "mode": "palette-classic" },
- "custom": {
- "axisBorderShow": false,
- "axisCenteredZero": false,
- "axisColorMode": "text",
- "axisLabel": "",
- "axisPlacement": "auto",
- "barAlignment": 0,
- "drawStyle": "line",
- "fillOpacity": 10,
- "gradientMode": "none",
- "hideFrom": { "legend": false, "tooltip": false, "viz": false },
- "insertNulls": false,
- "lineInterpolation": "linear",
- "lineWidth": 1,
- "pointSize": 5,
- "scaleDistribution": { "type": "linear" },
- "showPoints": "never",
- "spanNulls": false,
- "stacking": { "group": "A", "mode": "none" },
- "thresholdsStyle": { "mode": "off" }
- },
- "mappings": [],
- "thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }] },
- "unit": "s"
- },
- "overrides": []
- },
- "gridPos": { "h": 8, "w": 12, "x": 12, "y": 1 },
- "id": 2,
- "options": {
- "legend": { "calcs": ["mean", "p95"], "displayMode": "list", "placement": "bottom", "showLegend": true },
- "tooltip": { "mode": "multi", "sort": "none" }
- },
- "targets": [
- {
- "datasource": { "type": "prometheus", "uid": "prometheus" },
- "expr": "histogram_quantile(0.5, rate(vllm:time_to_first_token_seconds_bucket[5m]))",
- "legendFormat": "TTFT p50",
- "refId": "A"
- },
- {
- "datasource": { "type": "prometheus", "uid": "prometheus" },
- "expr": "histogram_quantile(0.95, rate(vllm:time_to_first_token_seconds_bucket[5m]))",
- "legendFormat": "TTFT p95",
- "refId": "B"
- }
- ],
- "title": "Time to First Token",
- "type": "timeseries"
- },
- {
- "datasource": { "type": "prometheus", "uid": "prometheus" },
- "fieldConfig": {
- "defaults": {
- "color": { "mode": "thresholds" },
- "mappings": [],
- "max": 100,
- "min": 0,
- "thresholds": {
- "mode": "absolute",
- "steps": [
- { "color": "green", "value": null },
- { "color": "yellow", "value": 70 },
- { "color": "red", "value": 90 }
- ]
- },
- "unit": "percent"
- },
- "overrides": []
- },
- "gridPos": { "h": 8, "w": 6, "x": 0, "y": 9 },
- "id": 3,
- "options": {
- "minVizHeight": 75,
- "minVizWidth": 75,
- "orientation": "auto",
- "reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
- "showThresholdLabels": false,
- "showThresholdMarkers": true
- },
- "targets": [
- {
- "datasource": { "type": "prometheus", "uid": "prometheus" },
- "expr": "vllm:gpu_cache_usage_perc * 100",
- "legendFormat": "GPU Cache",
- "refId": "A"
- }
- ],
- "title": "GPU KV Cache Usage",
- "type": "gauge"
- },
- {
- "datasource": { "type": "prometheus", "uid": "prometheus" },
- "fieldConfig": {
- "defaults": {
- "color": { "mode": "thresholds" },
- "mappings": [],
- "thresholds": {
- "mode": "absolute",
- "steps": [
- { "color": "green", "value": null },
- { "color": "yellow", "value": 5 },
- { "color": "red", "value": 10 }
- ]
- },
- "unit": "none"
- },
- "overrides": []
- },
- "gridPos": { "h": 8, "w": 6, "x": 6, "y": 9 },
- "id": 4,
- "options": {
- "colorMode": "value",
- "graphMode": "area",
- "justifyMode": "auto",
- "orientation": "auto",
- "reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
- "textMode": "auto"
- },
- "targets": [
- {
- "datasource": { "type": "prometheus", "uid": "prometheus" },
- "expr": "vllm:num_requests_waiting",
- "legendFormat": "Waiting",
- "refId": "A"
- }
- ],
- "title": "Requests Waiting",
- "type": "stat"
- },
- {
- "datasource": { "type": "prometheus", "uid": "prometheus" },
- "fieldConfig": {
- "defaults": {
- "color": { "mode": "thresholds" },
- "mappings": [],
- "thresholds": {
- "mode": "absolute",
- "steps": [{ "color": "blue", "value": null }]
- },
- "unit": "none"
- },
- "overrides": []
- },
- "gridPos": { "h": 8, "w": 6, "x": 12, "y": 9 },
- "id": 5,
- "options": {
- "colorMode": "value",
- "graphMode": "area",
- "justifyMode": "auto",
- "orientation": "auto",
- "reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
- "textMode": "auto"
- },
- "targets": [
- {
- "datasource": { "type": "prometheus", "uid": "prometheus" },
- "expr": "vllm:num_requests_running",
- "legendFormat": "Running",
- "refId": "A"
- }
- ],
- "title": "Requests Running",
- "type": "stat"
- },
- {
- "datasource": { "type": "prometheus", "uid": "prometheus" },
- "fieldConfig": {
- "defaults": {
- "color": { "mode": "palette-classic" },
- "custom": {
- "axisBorderShow": false,
- "axisCenteredZero": false,
- "axisColorMode": "text",
- "axisLabel": "",
- "axisPlacement": "auto",
- "barAlignment": 0,
- "drawStyle": "line",
- "fillOpacity": 10,
- "gradientMode": "none",
- "hideFrom": { "legend": false, "tooltip": false, "viz": false },
- "insertNulls": false,
- "lineInterpolation": "linear",
- "lineWidth": 1,
- "pointSize": 5,
- "scaleDistribution": { "type": "linear" },
- "showPoints": "never",
- "spanNulls": false,
- "stacking": { "group": "A", "mode": "none" },
- "thresholdsStyle": { "mode": "off" }
- },
- "mappings": [],
- "thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }] },
- "unit": "short"
- },
- "overrides": []
- },
- "gridPos": { "h": 8, "w": 6, "x": 18, "y": 9 },
- "id": 6,
- "options": {
- "legend": { "calcs": ["mean"], "displayMode": "list", "placement": "bottom", "showLegend": true },
- "tooltip": { "mode": "multi", "sort": "none" }
- },
- "targets": [
- {
- "datasource": { "type": "prometheus", "uid": "prometheus" },
- "expr": "rate(vllm:generation_tokens_total[1m])",
- "legendFormat": "Tokens/sec",
- "refId": "A"
- }
- ],
- "title": "Token Generation Rate",
- "type": "timeseries"
- },
- {
- "collapsed": false,
- "gridPos": { "h": 1, "w": 24, "x": 0, "y": 17 },
- "id": 101,
- "panels": [],
- "title": "System Resources",
- "type": "row"
- },
- {
- "datasource": { "type": "prometheus", "uid": "prometheus" },
- "description": "Requires node_exporter on host",
- "fieldConfig": {
- "defaults": {
- "color": { "mode": "palette-classic" },
- "custom": {
- "axisBorderShow": false,
- "axisCenteredZero": false,
- "axisColorMode": "text",
- "axisLabel": "",
- "axisPlacement": "auto",
- "barAlignment": 0,
- "drawStyle": "line",
- "fillOpacity": 10,
- "gradientMode": "none",
- "hideFrom": { "legend": false, "tooltip": false, "viz": false },
- "insertNulls": false,
- "lineInterpolation": "linear",
- "lineWidth": 1,
- "pointSize": 5,
- "scaleDistribution": { "type": "linear" },
- "showPoints": "never",
- "spanNulls": false,
- "stacking": { "group": "A", "mode": "none" },
- "thresholdsStyle": { "mode": "off" }
- },
- "mappings": [],
- "max": 100,
- "min": 0,
- "thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }] },
- "unit": "percent"
- },
- "overrides": []
- },
- "gridPos": { "h": 8, "w": 12, "x": 0, "y": 18 },
- "id": 7,
- "options": {
- "legend": { "calcs": ["mean", "max"], "displayMode": "list", "placement": "bottom", "showLegend": true },
- "tooltip": { "mode": "multi", "sort": "none" }
- },
- "targets": [
- {
- "datasource": { "type": "prometheus", "uid": "prometheus" },
- "expr": "100 - (avg by(instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
- "legendFormat": "CPU Usage",
- "refId": "A"
- }
- ],
- "title": "CPU Usage",
- "type": "timeseries"
- },
- {
- "datasource": { "type": "prometheus", "uid": "prometheus" },
- "description": "Requires node_exporter on host",
- "fieldConfig": {
- "defaults": {
- "color": { "mode": "palette-classic" },
- "custom": {
- "axisBorderShow": false,
- "axisCenteredZero": false,
- "axisColorMode": "text",
- "axisLabel": "",
- "axisPlacement": "auto",
- "barAlignment": 0,
- "drawStyle": "line",
- "fillOpacity": 10,
- "gradientMode": "none",
- "hideFrom": { "legend": false, "tooltip": false, "viz": false },
- "insertNulls": false,
- "lineInterpolation": "linear",
- "lineWidth": 1,
- "pointSize": 5,
- "scaleDistribution": { "type": "linear" },
- "showPoints": "never",
- "spanNulls": false,
- "stacking": { "group": "A", "mode": "none" },
- "thresholdsStyle": { "mode": "off" }
- },
- "mappings": [],
- "thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }] },
- "unit": "bytes"
- },
- "overrides": []
- },
- "gridPos": { "h": 8, "w": 12, "x": 12, "y": 18 },
- "id": 8,
- "options": {
- "legend": { "calcs": ["mean", "max"], "displayMode": "list", "placement": "bottom", "showLegend": true },
- "tooltip": { "mode": "multi", "sort": "none" }
- },
- "targets": [
- {
- "datasource": { "type": "prometheus", "uid": "prometheus" },
- "expr": "node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes",
- "legendFormat": "Used Memory",
- "refId": "A"
- },
- {
- "datasource": { "type": "prometheus", "uid": "prometheus" },
- "expr": "node_memory_MemTotal_bytes",
- "legendFormat": "Total Memory",
- "refId": "B"
- }
- ],
- "title": "Memory Usage",
- "type": "timeseries"
- },
- {
- "collapsed": false,
- "gridPos": { "h": 1, "w": 24, "x": 0, "y": 26 },
- "id": 102,
- "panels": [],
- "title": "GPU (requires dcgm-exporter)",
- "type": "row"
- },
- {
- "datasource": { "type": "prometheus", "uid": "prometheus" },
- "description": "Requires dcgm-exporter on host",
- "fieldConfig": {
- "defaults": {
- "color": { "mode": "palette-classic" },
- "custom": {
- "axisBorderShow": false,
- "axisCenteredZero": false,
- "axisColorMode": "text",
- "axisLabel": "",
- "axisPlacement": "auto",
- "barAlignment": 0,
- "drawStyle": "line",
- "fillOpacity": 10,
- "gradientMode": "none",
- "hideFrom": { "legend": false, "tooltip": false, "viz": false },
- "insertNulls": false,
- "lineInterpolation": "linear",
- "lineWidth": 1,
- "pointSize": 5,
- "scaleDistribution": { "type": "linear" },
- "showPoints": "never",
- "spanNulls": false,
- "stacking": { "group": "A", "mode": "none" },
- "thresholdsStyle": { "mode": "off" }
- },
- "mappings": [],
- "max": 100,
- "min": 0,
- "thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }] },
- "unit": "percent"
- },
- "overrides": []
- },
- "gridPos": { "h": 8, "w": 8, "x": 0, "y": 27 },
- "id": 9,
- "options": {
- "legend": { "calcs": ["mean", "max"], "displayMode": "list", "placement": "bottom", "showLegend": true },
- "tooltip": { "mode": "multi", "sort": "none" }
- },
- "targets": [
- {
- "datasource": { "type": "prometheus", "uid": "prometheus" },
- "expr": "DCGM_FI_DEV_GPU_UTIL",
- "legendFormat": "GPU {{gpu}}",
- "refId": "A"
- }
- ],
- "title": "GPU Utilization",
- "type": "timeseries"
- },
- {
- "datasource": { "type": "prometheus", "uid": "prometheus" },
- "description": "Requires dcgm-exporter on host",
- "fieldConfig": {
- "defaults": {
- "color": { "mode": "palette-classic" },
- "custom": {
- "axisBorderShow": false,
- "axisCenteredZero": false,
- "axisColorMode": "text",
- "axisLabel": "",
- "axisPlacement": "auto",
- "barAlignment": 0,
- "drawStyle": "line",
- "fillOpacity": 10,
- "gradientMode": "none",
- "hideFrom": { "legend": false, "tooltip": false, "viz": false },
- "insertNulls": false,
- "lineInterpolation": "linear",
- "lineWidth": 1,
- "pointSize": 5,
- "scaleDistribution": { "type": "linear" },
- "showPoints": "never",
- "spanNulls": false,
- "stacking": { "group": "A", "mode": "none" },
- "thresholdsStyle": { "mode": "off" }
- },
- "mappings": [],
- "thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }] },
- "unit": "bytes"
- },
- "overrides": []
- },
- "gridPos": { "h": 8, "w": 8, "x": 8, "y": 27 },
- "id": 10,
- "options": {
- "legend": { "calcs": ["mean", "max"], "displayMode": "list", "placement": "bottom", "showLegend": true },
- "tooltip": { "mode": "multi", "sort": "none" }
- },
- "targets": [
- {
- "datasource": { "type": "prometheus", "uid": "prometheus" },
- "expr": "DCGM_FI_DEV_FB_USED * 1024 * 1024",
- "legendFormat": "GPU {{gpu}} Used",
- "refId": "A"
- },
- {
- "datasource": { "type": "prometheus", "uid": "prometheus" },
- "expr": "DCGM_FI_DEV_FB_FREE * 1024 * 1024",
- "legendFormat": "GPU {{gpu}} Free",
- "refId": "B"
- }
- ],
- "title": "GPU Memory",
- "type": "timeseries"
- },
- {
- "datasource": { "type": "prometheus", "uid": "prometheus" },
- "description": "Requires dcgm-exporter on host",
- "fieldConfig": {
- "defaults": {
- "color": { "mode": "palette-classic" },
- "custom": {
- "axisBorderShow": false,
- "axisCenteredZero": false,
- "axisColorMode": "text",
- "axisLabel": "",
- "axisPlacement": "auto",
- "barAlignment": 0,
- "drawStyle": "line",
- "fillOpacity": 10,
- "gradientMode": "none",
- "hideFrom": { "legend": false, "tooltip": false, "viz": false },
- "insertNulls": false,
- "lineInterpolation": "linear",
- "lineWidth": 1,
- "pointSize": 5,
- "scaleDistribution": { "type": "linear" },
- "showPoints": "never",
- "spanNulls": false,
- "stacking": { "group": "A", "mode": "none" },
- "thresholdsStyle": { "mode": "off" }
- },
- "mappings": [],
- "thresholds": { "mode": "absolute", "steps": [{ "color": "green", "value": null }] },
- "unit": "celsius"
- },
- "overrides": []
- },
- "gridPos": { "h": 8, "w": 8, "x": 16, "y": 27 },
- "id": 11,
- "options": {
- "legend": { "calcs": ["mean", "max"], "displayMode": "list", "placement": "bottom", "showLegend": true },
- "tooltip": { "mode": "multi", "sort": "none" }
- },
- "targets": [
- {
- "datasource": { "type": "prometheus", "uid": "prometheus" },
- "expr": "DCGM_FI_DEV_GPU_TEMP",
- "legendFormat": "GPU {{gpu}} Temp",
- "refId": "A"
- }
- ],
- "title": "GPU Temperature",
- "type": "timeseries"
- }
- ],
- "refresh": "10s",
- "schemaVersion": 39,
- "tags": ["dream-server", "vllm", "inference"],
- "templating": { "list": [] },
- "time": { "from": "now-1h", "to": "now" },
- "timepicker": {},
- "timezone": "browser",
- "title": "Dream Server Overview",
- "uid": "dream-server-overview",
- "version": 1
-}
diff --git a/dream-server/compose/grafana/datasources/prometheus.yml b/dream-server/compose/grafana/datasources/prometheus.yml
deleted file mode 100644
index bb009bb21..000000000
--- a/dream-server/compose/grafana/datasources/prometheus.yml
+++ /dev/null
@@ -1,9 +0,0 @@
-apiVersion: 1
-
-datasources:
- - name: Prometheus
- type: prometheus
- access: proxy
- url: http://prometheus:9090
- isDefault: true
- editable: false
diff --git a/dream-server/compose/livekit-cluster.yaml.template b/dream-server/compose/livekit-cluster.yaml.template
deleted file mode 100644
index 4fc0bcda3..000000000
--- a/dream-server/compose/livekit-cluster.yaml.template
+++ /dev/null
@@ -1,39 +0,0 @@
-port: 7880
-rtc:
- port_range_start: 50000
- port_range_end: 50100
- use_external_ip: true
- tcp_port: 7881
- udp_port: 7882
-
-# Production keys โ set via LIVEKIT_API_KEY and LIVEKIT_API_SECRET environment variables
-keys:
- ${LIVEKIT_API_KEY}: ${LIVEKIT_API_SECRET}
-
-logging:
- level: info
- json: true
-
-# Limits for cluster tier
-limit:
- num_tracks: 100
- bytes_per_sec: 100000000 # 100 MB/s total
- subscription_limit_video: 50
- subscription_limit_audio: 100
-
-# Room settings
-room:
- auto_create: true
- empty_timeout: 300 # 5 min
- max_participants: 50 # per room
-
-# Turn server (use external TURN for production)
-# turn:
-# enabled: true
-# domain: turn.example.com
-# tls_port: 443
-
-# Webhook for analytics (optional)
-# webhook:
-# urls:
-# - https://your-webhook-endpoint.com/livekit
diff --git a/dream-server/compose/livekit-entrypoint.sh b/dream-server/compose/livekit-entrypoint.sh
deleted file mode 100644
index 4268e47db..000000000
--- a/dream-server/compose/livekit-entrypoint.sh
+++ /dev/null
@@ -1,22 +0,0 @@
-#!/bin/bash
-# LiveKit Server Entrypoint with Template Substitution
-# Replaces environment variables in livekit.yaml.template โ livekit.yaml
-
-set -e
-
-# Required environment variables
-if [[ -z "${LIVEKIT_API_KEY}" ]]; then
- echo "ERROR: LIVEKIT_API_KEY must be set" >&2
- exit 1
-fi
-
-if [[ -z "${LIVEKIT_API_SECRET}" ]]; then
- echo "ERROR: LIVEKIT_API_SECRET must be set" >&2
- exit 1
-fi
-
-# Substitute environment variables in template
-envsubst < /etc/livekit.yaml.template > /etc/livekit.yaml
-
-# Run LiveKit with the generated config
-exec livekit-server --config /etc/livekit.yaml "$@"
diff --git a/dream-server/compose/livekit.yaml b/dream-server/compose/livekit.yaml
deleted file mode 100644
index b5781e170..000000000
--- a/dream-server/compose/livekit.yaml
+++ /dev/null
@@ -1,30 +0,0 @@
-# LiveKit Server Configuration for Dream Server
-# https://docs.livekit.io/home/self-hosting/vm/#config
-#
-# SECURITY: API keys are set via LIVEKIT_KEYS environment variable
-# in docker-compose, NOT in this file. Never commit secrets here.
-
-port: 7880
-rtc:
- port_range_start: 50000
- port_range_end: 60000
- tcp_port: 7881
- use_external_ip: false
-
-# Keys are injected via LIVEKIT_KEYS environment variable
-# Do not add a 'keys:' section here - it will conflict with env var
-
-logging:
- level: info
- pion_level: warn
-
-room:
- enabled_codecs:
- - mime: audio/opus
- - mime: audio/red
- max_participants: 10
- empty_timeout: 300
- departure_timeout: 20
-
-turn:
- enabled: false
diff --git a/dream-server/compose/prometheus.yml b/dream-server/compose/prometheus.yml
deleted file mode 100644
index 61de110e9..000000000
--- a/dream-server/compose/prometheus.yml
+++ /dev/null
@@ -1,28 +0,0 @@
-# Prometheus Configuration โ Dream Server Cluster
-# Scrapes metrics from vLLM, Whisper, and system
-
-global:
- scrape_interval: 15s
- evaluation_interval: 15s
-
-scrape_configs:
- # vLLM metrics
- - job_name: 'vllm'
- static_configs:
- - targets: ['vllm:8000']
- metrics_path: /metrics
-
- # Node exporter (if installed on host)
- - job_name: 'node'
- static_configs:
- - targets: ['host.docker.internal:9100']
-
- # NVIDIA GPU metrics (dcgm-exporter)
- - job_name: 'gpu'
- static_configs:
- - targets: ['host.docker.internal:9400']
-
- # Prometheus self-monitoring
- - job_name: 'prometheus'
- static_configs:
- - targets: ['localhost:9090']
diff --git a/dream-server/config/backends/amd.json b/dream-server/config/backends/amd.json
new file mode 100644
index 000000000..f444da7bf
--- /dev/null
+++ b/dream-server/config/backends/amd.json
@@ -0,0 +1,9 @@
+{
+ "id": "amd",
+ "llm_engine": "llama-server",
+ "service_name": "llama-server",
+ "public_api_port": 8080,
+ "public_health_url": "http://localhost:8080/health",
+ "provider_name": "local-ollama",
+ "provider_url": "http://llama-server:8080/v1"
+}
diff --git a/dream-server/config/backends/apple.json b/dream-server/config/backends/apple.json
new file mode 100644
index 000000000..2a4cfd3f8
--- /dev/null
+++ b/dream-server/config/backends/apple.json
@@ -0,0 +1,9 @@
+{
+ "id": "apple",
+ "llm_engine": "llama-server",
+ "service_name": "llama-server",
+ "public_api_port": 8080,
+ "public_health_url": "http://localhost:8080/health",
+ "provider_name": "local-mlx",
+ "provider_url": "http://llama-server:8080/v1"
+}
diff --git a/dream-server/config/backends/cpu.json b/dream-server/config/backends/cpu.json
new file mode 100644
index 000000000..c4e2ca5ff
--- /dev/null
+++ b/dream-server/config/backends/cpu.json
@@ -0,0 +1,9 @@
+{
+ "id": "cpu",
+ "llm_engine": "llama-server",
+ "service_name": "llama-server",
+ "public_api_port": 8080,
+ "public_health_url": "http://localhost:8080/health",
+ "provider_name": "local-llama",
+ "provider_url": "http://llama-server:8080/v1"
+}
diff --git a/dream-server/config/backends/nvidia.json b/dream-server/config/backends/nvidia.json
new file mode 100644
index 000000000..446ed6a74
--- /dev/null
+++ b/dream-server/config/backends/nvidia.json
@@ -0,0 +1,9 @@
+{
+ "id": "nvidia",
+ "llm_engine": "llama-server",
+ "service_name": "llama-server",
+ "public_api_port": 8080,
+ "public_health_url": "http://localhost:8080/health",
+ "provider_name": "local-llama",
+ "provider_url": "http://llama-server:8080/v1"
+}
diff --git a/dream-server/config/capability-profile.schema.json b/dream-server/config/capability-profile.schema.json
new file mode 100644
index 000000000..f452f8f35
--- /dev/null
+++ b/dream-server/config/capability-profile.schema.json
@@ -0,0 +1,117 @@
+{
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
+ "$id": "https://dream-server.dev/schema/capability-profile.v1.json",
+ "title": "Dream Server Capability Profile v1",
+ "type": "object",
+ "required": [
+ "version",
+ "platform",
+ "gpu",
+ "runtime",
+ "compose",
+ "tier",
+ "hardware_class"
+ ],
+ "properties": {
+ "version": {
+ "const": "1"
+ },
+ "platform": {
+ "type": "object",
+ "required": ["id", "family"],
+ "properties": {
+ "id": {
+ "type": "string",
+ "enum": ["linux", "wsl", "macos", "windows", "unknown"]
+ },
+ "family": {
+ "type": "string",
+ "enum": ["linux", "windows", "darwin", "unknown"]
+ }
+ },
+ "additionalProperties": false
+ },
+ "gpu": {
+ "type": "object",
+ "required": ["vendor", "name", "memory_type", "count", "vram_mb"],
+ "properties": {
+ "vendor": {
+ "type": "string",
+ "enum": ["nvidia", "amd", "apple", "none", "unknown"]
+ },
+ "name": {
+ "type": "string"
+ },
+ "memory_type": {
+ "type": "string",
+ "enum": ["discrete", "unified", "none", "unknown"]
+ },
+ "count": {
+ "type": "integer",
+ "minimum": 0
+ },
+ "vram_mb": {
+ "type": "integer",
+ "minimum": 0
+ }
+ },
+ "additionalProperties": false
+ },
+ "runtime": {
+ "type": "object",
+ "required": ["llm_backend", "llm_health_url", "llm_api_port"],
+ "properties": {
+ "llm_backend": {
+ "type": "string",
+ "enum": ["nvidia", "amd", "apple", "cpu"]
+ },
+ "llm_health_url": {
+ "type": "string"
+ },
+ "llm_api_port": {
+ "type": "integer",
+ "minimum": 1
+ }
+ },
+ "additionalProperties": false
+ },
+ "compose": {
+ "type": "object",
+ "required": ["overlays"],
+ "properties": {
+ "overlays": {
+ "type": "array",
+ "items": {
+ "type": "string"
+ }
+ }
+ },
+ "additionalProperties": false
+ },
+ "tier": {
+ "type": "object",
+ "required": ["recommended"],
+ "properties": {
+ "recommended": {
+ "type": "string",
+ "enum": ["T1", "T2", "T3", "T4", "SH_COMPACT", "SH_LARGE"]
+ }
+ },
+ "additionalProperties": false
+ },
+ "hardware_class": {
+ "type": "object",
+ "required": ["id", "label"],
+ "properties": {
+ "id": {
+ "type": "string"
+ },
+ "label": {
+ "type": "string"
+ }
+ },
+ "additionalProperties": false
+ }
+ },
+ "additionalProperties": false
+}
diff --git a/dream-server/config/gpu-database.json b/dream-server/config/gpu-database.json
new file mode 100644
index 000000000..6240101ac
--- /dev/null
+++ b/dream-server/config/gpu-database.json
@@ -0,0 +1,275 @@
+{
+ "schema_version": "dream.hardware.v1",
+ "_attribution": {
+ "gpu_bandwidth_data": "llmfit by Alex Jones (MIT) โ github.com/AlexsJones/llmfit",
+ "note": "GPU bandwidth numbers sourced from the llmfit project's hardware database. Thank you to Alex Jones and the llmfit contributors for maintaining this excellent open-source resource."
+ },
+ "known_gpus": [
+ {
+ "id": "rtx_pro_6000_blackwell",
+ "match": {
+ "device_ids": [],
+ "name_patterns": ["RTX PRO 6000", "Blackwell"]
+ },
+ "specs": {
+ "label": "NVIDIA RTX PRO 6000 Blackwell Workstation Edition",
+ "vendor": "nvidia",
+ "architecture": "blackwell",
+ "memory_type": "discrete",
+ "memory_mb": 96000,
+ "memory_source": "vram",
+ "bandwidth_gbps": 1792
+ },
+ "recommended": {
+ "backend": "nvidia",
+ "tier": "NV_ULTRA"
+ }
+ },
+ {
+ "id": "strix_halo_395",
+ "match": {
+ "device_ids": ["0x1586"],
+ "name_patterns": ["Radeon 8060S", "RYZEN AI MAX+ 395", "Strix Halo"]
+ },
+ "specs": {
+ "label": "AMD Ryzen AI MAX+ 395 (Strix Halo)",
+ "vendor": "amd",
+ "architecture": "rdna-3.5",
+ "memory_type": "unified",
+ "memory_mb": 98304,
+ "memory_source": "ram",
+ "bandwidth_gbps": 256,
+ "compute_units": 40
+ },
+ "recommended": {
+ "backend": "amd",
+ "tier": "SH_LARGE"
+ }
+ },
+ {
+ "id": "strix_halo_390",
+ "match": {
+ "device_ids": ["0x1586"],
+ "name_patterns": ["RYZEN AI MAX 390", "Radeon 8050S"]
+ },
+ "specs": {
+ "label": "AMD Ryzen AI MAX 390 (Strix Halo)",
+ "vendor": "amd",
+ "architecture": "rdna-3.5",
+ "memory_type": "unified",
+ "memory_mb": 65536,
+ "memory_source": "ram",
+ "bandwidth_gbps": 256,
+ "compute_units": 32
+ },
+ "recommended": {
+ "backend": "amd",
+ "tier": "SH_COMPACT"
+ }
+ },
+ {
+ "id": "strix_halo_385",
+ "match": {
+ "device_ids": ["0x1586"],
+ "name_patterns": ["RYZEN AI MAX+ 385"]
+ },
+ "specs": {
+ "label": "AMD Ryzen AI MAX+ 385 (Strix Halo)",
+ "vendor": "amd",
+ "architecture": "rdna-3.5",
+ "memory_type": "unified",
+ "memory_mb": 98304,
+ "memory_source": "ram",
+ "bandwidth_gbps": 256,
+ "compute_units": 32
+ },
+ "recommended": {
+ "backend": "amd",
+ "tier": "SH_LARGE"
+ }
+ }
+ ],
+ "known_gpu_bandwidth": {
+ "nvidia": {
+ "RTX PRO 6000": 1792,
+ "RTX 5090": 1792,
+ "RTX 5080": 960,
+ "RTX 5070 Ti": 896,
+ "RTX 5070": 672,
+ "RTX 5060 Ti": 448,
+ "RTX 5060": 256,
+ "RTX 4090": 1008,
+ "RTX 4080 Super": 736,
+ "RTX 4080": 717,
+ "RTX 4070 Ti Super": 672,
+ "RTX 4070 Ti": 504,
+ "RTX 4070 Super": 504,
+ "RTX 4070": 504,
+ "RTX 4060 Ti": 288,
+ "RTX 4060": 272,
+ "RTX 3090 Ti": 1008,
+ "RTX 3090": 936,
+ "RTX 3080 Ti": 912,
+ "RTX 3080": 760,
+ "RTX 3070 Ti": 608,
+ "RTX 3070": 448,
+ "RTX 3060 Ti": 448,
+ "RTX 3060": 360,
+ "RTX 2080 Ti": 616,
+ "RTX 2080 Super": 496,
+ "RTX 2080": 448,
+ "RTX 2070 Super": 448,
+ "RTX 2070": 448,
+ "RTX 2060 Super": 448,
+ "RTX 2060": 336,
+ "GTX 1660 Ti": 288,
+ "GTX 1660 Super": 336,
+ "GTX 1660": 192,
+ "GTX 1650 Super": 192,
+ "GTX 1650": 128,
+ "H200": 4800,
+ "H100 SXM": 3350,
+ "H100 PCIe": 2039,
+ "A100 SXM": 2039,
+ "A100 PCIe": 1555,
+ "V100 SXM": 900,
+ "V100": 897,
+ "L40S": 864,
+ "L40": 864,
+ "A6000": 768,
+ "A5000": 768,
+ "A10G": 600,
+ "A10": 600,
+ "A4000": 448,
+ "T4": 320,
+ "L4": 300
+ },
+ "amd": {
+ "RX 9070 XT": 624,
+ "RX 9070": 488,
+ "RX 7900 XTX": 960,
+ "RX 7900 XT": 800,
+ "RX 7900 GRE": 576,
+ "RX 7800 XT": 624,
+ "RX 7700 XT": 432,
+ "RX 7600": 288,
+ "RX 6950 XT": 576,
+ "RX 6900 XT": 512,
+ "RX 6800 XT": 512,
+ "RX 6800": 512,
+ "RX 6700 XT": 384,
+ "RX 6600 XT": 256,
+ "RX 6600": 224,
+ "MI300X": 5300,
+ "MI300": 5300,
+ "MI250X": 3277,
+ "MI250": 3277,
+ "MI210": 1638,
+ "MI100": 1229
+ },
+ "apple": {
+ "M4 Ultra": 819,
+ "M4 Max": 546,
+ "M4 Pro": 273,
+ "M4": 120,
+ "M3 Ultra": 800,
+ "M3 Max": 400,
+ "M3 Pro": 150,
+ "M3": 100,
+ "M2 Ultra": 800,
+ "M2 Max": 400,
+ "M2 Pro": 200,
+ "M2": 100,
+ "M1 Ultra": 800,
+ "M1 Max": 400,
+ "M1 Pro": 200,
+ "M1": 68
+ }
+ },
+ "heuristic_classes": [
+ {
+ "id": "nvidia_ultra",
+ "match": { "vendor": "nvidia", "memory_type": "discrete", "min_vram_mb": 92160 },
+ "recommended": { "backend": "nvidia", "tier": "NV_ULTRA" }
+ },
+ {
+ "id": "nvidia_enterprise",
+ "match": { "vendor": "nvidia", "memory_type": "discrete", "min_vram_mb": 40960 },
+ "recommended": { "backend": "nvidia", "tier": "T4" }
+ },
+ {
+ "id": "nvidia_pro",
+ "match": { "vendor": "nvidia", "memory_type": "discrete", "min_vram_mb": 20480 },
+ "recommended": { "backend": "nvidia", "tier": "T3" }
+ },
+ {
+ "id": "nvidia_prosumer",
+ "match": { "vendor": "nvidia", "memory_type": "discrete", "min_vram_mb": 12288 },
+ "recommended": { "backend": "nvidia", "tier": "T2" }
+ },
+ {
+ "id": "nvidia_entry",
+ "match": { "vendor": "nvidia", "memory_type": "discrete", "min_vram_mb": 0 },
+ "recommended": { "backend": "nvidia", "tier": "T1" }
+ },
+ {
+ "id": "amd_unified_large",
+ "match": { "vendor": "amd", "memory_type": "unified", "min_ram_mb": 92160 },
+ "recommended": { "backend": "amd", "tier": "SH_LARGE" }
+ },
+ {
+ "id": "amd_unified_compact",
+ "match": { "vendor": "amd", "memory_type": "unified", "min_ram_mb": 0 },
+ "recommended": { "backend": "amd", "tier": "SH_COMPACT" }
+ },
+ {
+ "id": "amd_discrete_large",
+ "match": { "vendor": "amd", "memory_type": "discrete", "min_vram_mb": 20480 },
+ "recommended": { "backend": "amd", "tier": "T3" }
+ },
+ {
+ "id": "amd_discrete_medium",
+ "match": { "vendor": "amd", "memory_type": "discrete", "min_vram_mb": 12288 },
+ "recommended": { "backend": "amd", "tier": "T2" }
+ },
+ {
+ "id": "amd_discrete_entry",
+ "match": { "vendor": "amd", "memory_type": "discrete", "min_vram_mb": 0 },
+ "recommended": { "backend": "amd", "tier": "T1" }
+ },
+ {
+ "id": "apple_ultra",
+ "match": { "vendor": "apple", "memory_type": "unified", "min_ram_mb": 131072 },
+ "recommended": { "backend": "apple", "tier": "T4" }
+ },
+ {
+ "id": "apple_max",
+ "match": { "vendor": "apple", "memory_type": "unified", "min_ram_mb": 65536 },
+ "recommended": { "backend": "apple", "tier": "T3" }
+ },
+ {
+ "id": "apple_pro",
+ "match": { "vendor": "apple", "memory_type": "unified", "min_ram_mb": 32768 },
+ "recommended": { "backend": "apple", "tier": "T2" }
+ },
+ {
+ "id": "apple_base",
+ "match": { "vendor": "apple", "memory_type": "unified", "min_ram_mb": 0 },
+ "recommended": { "backend": "apple", "tier": "T1" }
+ },
+ {
+ "id": "cpu_only",
+ "match": { "vendor": "none", "memory_type": "none", "min_ram_mb": 0 },
+ "recommended": { "backend": "cpu", "tier": "T1" }
+ }
+ ],
+ "defaults": {
+ "bandwidth_gbps": {
+ "cuda": 220,
+ "rocm": 180,
+ "metal": 160,
+ "cpu_x86": 70,
+ "cpu_arm": 50
+ }
+ }
+}
diff --git a/dream-server/config/gpu-database.schema.json b/dream-server/config/gpu-database.schema.json
new file mode 100644
index 000000000..87d38a8bd
--- /dev/null
+++ b/dream-server/config/gpu-database.schema.json
@@ -0,0 +1,138 @@
+{
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
+ "$id": "dream.hardware.v1",
+ "title": "Dream Server GPU Database",
+ "description": "GPU knowledge base for hardware classification. Known GPUs with specs, bandwidth lookup table, and heuristic fallback classes.",
+ "type": "object",
+ "required": ["schema_version", "known_gpus", "known_gpu_bandwidth", "heuristic_classes", "defaults"],
+ "properties": {
+ "schema_version": {
+ "type": "string",
+ "const": "dream.hardware.v1"
+ },
+ "_attribution": {
+ "type": "object",
+ "properties": {
+ "gpu_bandwidth_data": { "type": "string" },
+ "note": { "type": "string" }
+ }
+ },
+ "known_gpus": {
+ "type": "array",
+ "items": { "$ref": "#/$defs/known_gpu" }
+ },
+ "known_gpu_bandwidth": {
+ "type": "object",
+ "properties": {
+ "nvidia": { "$ref": "#/$defs/bandwidth_map" },
+ "amd": { "$ref": "#/$defs/bandwidth_map" },
+ "apple": { "$ref": "#/$defs/bandwidth_map" }
+ },
+ "additionalProperties": { "$ref": "#/$defs/bandwidth_map" }
+ },
+ "heuristic_classes": {
+ "type": "array",
+ "items": { "$ref": "#/$defs/heuristic_class" }
+ },
+ "defaults": {
+ "type": "object",
+ "required": ["bandwidth_gbps"],
+ "properties": {
+ "bandwidth_gbps": {
+ "type": "object",
+ "additionalProperties": { "type": "number", "minimum": 0 }
+ }
+ }
+ }
+ },
+ "$defs": {
+ "known_gpu": {
+ "type": "object",
+ "required": ["id", "match", "specs", "recommended"],
+ "properties": {
+ "id": {
+ "type": "string",
+ "pattern": "^[a-z0-9_]+$",
+ "description": "Unique identifier for this known GPU entry"
+ },
+ "match": {
+ "type": "object",
+ "properties": {
+ "device_ids": {
+ "type": "array",
+ "items": { "type": "string", "pattern": "^0x[0-9a-fA-F]{4}$" },
+ "description": "PCI device IDs to match (exact)"
+ },
+ "name_patterns": {
+ "type": "array",
+ "items": { "type": "string" },
+ "description": "Substring patterns to match against GPU name (case-insensitive)"
+ }
+ },
+ "anyOf": [
+ { "required": ["device_ids"] },
+ { "required": ["name_patterns"] }
+ ]
+ },
+ "specs": {
+ "type": "object",
+ "required": ["label", "vendor", "architecture", "memory_type", "memory_mb", "bandwidth_gbps"],
+ "properties": {
+ "label": { "type": "string" },
+ "vendor": { "enum": ["nvidia", "amd", "apple", "intel"] },
+ "architecture": { "type": "string" },
+ "memory_type": { "enum": ["discrete", "unified"] },
+ "memory_mb": { "type": "integer", "minimum": 0 },
+ "memory_source": {
+ "enum": ["vram", "ram"],
+ "description": "Where to read actual memory from. 'ram' = use system RAM (for unified memory GPUs where reported VRAM is unreliable)"
+ },
+ "bandwidth_gbps": { "type": "number", "minimum": 0 },
+ "compute_units": { "type": "integer", "minimum": 0 }
+ }
+ },
+ "recommended": {
+ "$ref": "#/$defs/recommendation"
+ }
+ }
+ },
+ "heuristic_class": {
+ "type": "object",
+ "required": ["id", "match", "recommended"],
+ "properties": {
+ "id": {
+ "type": "string",
+ "pattern": "^[a-z0-9_]+$"
+ },
+ "match": {
+ "type": "object",
+ "properties": {
+ "vendor": { "enum": ["nvidia", "amd", "apple", "intel", "none"] },
+ "memory_type": { "enum": ["discrete", "unified", "none"] },
+ "min_vram_mb": { "type": "integer", "minimum": 0 },
+ "min_ram_mb": { "type": "integer", "minimum": 0 }
+ }
+ },
+ "recommended": {
+ "$ref": "#/$defs/recommendation"
+ }
+ }
+ },
+ "recommendation": {
+ "type": "object",
+ "required": ["backend", "tier"],
+ "properties": {
+ "backend": { "enum": ["nvidia", "amd", "apple", "cpu"] },
+ "tier": {
+ "type": "string",
+ "pattern": "^(T[1-4]|SH_LARGE|SH_COMPACT|NV_ULTRA)$"
+ }
+ }
+ },
+ "bandwidth_map": {
+ "type": "object",
+ "additionalProperties": { "type": "number", "minimum": 0 },
+ "description": "Map of GPU model name to bandwidth in GB/s"
+ }
+ }
+}
diff --git a/dream-server/config/hardware-classes.json b/dream-server/config/hardware-classes.json
new file mode 100644
index 000000000..6fc3d4b83
--- /dev/null
+++ b/dream-server/config/hardware-classes.json
@@ -0,0 +1,155 @@
+{
+ "version": "1",
+ "classes": [
+ {
+ "id": "strix_unified_large",
+ "label": "Strix Halo (90GB+)",
+ "match": {
+ "platform_id": ["linux", "wsl"],
+ "gpu_vendor": ["amd"],
+ "memory_type": ["unified"],
+ "min_vram_mb": 92160
+ },
+ "recommended": {
+ "backend": "amd",
+ "tier": "SH_LARGE",
+ "compose_overlays": ["docker-compose.base.yml", "docker-compose.amd.yml"]
+ }
+ },
+ {
+ "id": "strix_unified",
+ "label": "Strix Unified",
+ "match": {
+ "platform_id": ["linux", "wsl"],
+ "gpu_vendor": ["amd"],
+ "memory_type": ["unified"],
+ "min_vram_mb": 65536
+ },
+ "recommended": {
+ "backend": "amd",
+ "tier": "SH_COMPACT",
+ "compose_overlays": ["docker-compose.base.yml", "docker-compose.amd.yml"]
+ }
+ },
+ {
+ "id": "nvidia_ultra",
+ "label": "NVIDIA Ultra (90GB+)",
+ "match": {
+ "platform_id": ["linux", "wsl"],
+ "gpu_vendor": ["nvidia"],
+ "memory_type": ["discrete"],
+ "min_vram_mb": 92160
+ },
+ "recommended": {
+ "backend": "nvidia",
+ "tier": "NV_ULTRA",
+ "compose_overlays": ["docker-compose.base.yml", "docker-compose.nvidia.yml"]
+ }
+ },
+ {
+ "id": "nvidia_enterprise",
+ "label": "NVIDIA Enterprise (40GB+)",
+ "match": {
+ "platform_id": ["linux", "wsl"],
+ "gpu_vendor": ["nvidia"],
+ "memory_type": ["discrete"],
+ "min_vram_mb": 40960
+ },
+ "recommended": {
+ "backend": "nvidia",
+ "tier": "T4",
+ "compose_overlays": ["docker-compose.base.yml", "docker-compose.nvidia.yml"]
+ }
+ },
+ {
+ "id": "nvidia_pro",
+ "label": "NVIDIA Pro (20GB+)",
+ "match": {
+ "platform_id": ["linux", "wsl"],
+ "gpu_vendor": ["nvidia"],
+ "memory_type": ["discrete"],
+ "min_vram_mb": 20480
+ },
+ "recommended": {
+ "backend": "nvidia",
+ "tier": "T3",
+ "compose_overlays": ["docker-compose.base.yml", "docker-compose.nvidia.yml"]
+ }
+ },
+ {
+ "id": "nvidia_prosumer",
+ "label": "NVIDIA Prosumer (12GB+)",
+ "match": {
+ "platform_id": ["linux", "wsl"],
+ "gpu_vendor": ["nvidia"],
+ "memory_type": ["discrete"],
+ "min_vram_mb": 12288
+ },
+ "recommended": {
+ "backend": "nvidia",
+ "tier": "T2",
+ "compose_overlays": ["docker-compose.base.yml", "docker-compose.nvidia.yml"]
+ }
+ },
+ {
+ "id": "nvidia_entry",
+ "label": "NVIDIA Entry",
+ "match": {
+ "platform_id": ["linux", "wsl"],
+ "gpu_vendor": ["nvidia"],
+ "memory_type": ["discrete"],
+ "min_vram_mb": 0
+ },
+ "recommended": {
+ "backend": "nvidia",
+ "tier": "T1",
+ "compose_overlays": ["docker-compose.base.yml", "docker-compose.nvidia.yml"]
+ }
+ },
+ {
+ "id": "apple_silicon_pro",
+ "label": "Apple Silicon Pro (36GB+)",
+ "match": {
+ "platform_id": ["macos"],
+ "gpu_vendor": ["apple"],
+ "memory_type": ["unified"],
+ "min_vram_mb": 36864
+ },
+ "recommended": {
+ "backend": "apple",
+ "tier": "T3",
+ "compose_overlays": ["docker-compose.base.yml", "docker-compose.apple.yml"]
+ }
+ },
+ {
+ "id": "apple_silicon",
+ "label": "Apple Silicon",
+ "match": {
+ "platform_id": ["macos"],
+ "gpu_vendor": ["apple"],
+ "memory_type": ["unified"],
+ "min_vram_mb": 8192
+ },
+ "recommended": {
+ "backend": "apple",
+ "tier": "T2",
+ "compose_overlays": ["docker-compose.base.yml", "docker-compose.apple.yml"]
+ }
+ },
+ {
+ "id": "cpu_fallback",
+ "label": "CPU Fallback",
+ "match": {
+ "platform_id": ["linux", "wsl", "macos", "windows", "unknown"],
+ "gpu_vendor": ["none", "unknown"],
+ "memory_type": ["discrete", "unified", "none", "unknown"],
+ "min_vram_mb": 0
+ },
+ "recommended": {
+ "backend": "cpu",
+ "tier": "T1",
+ "compose_overlays": ["docker-compose.base.yml"]
+ }
+ }
+ ]
+}
diff --git a/dream-server/config/installer-sim-summary.schema.json b/dream-server/config/installer-sim-summary.schema.json
new file mode 100644
index 000000000..c4b1c4dbc
--- /dev/null
+++ b/dream-server/config/installer-sim-summary.schema.json
@@ -0,0 +1,57 @@
+{
+ "$schema": "https://json-schema.org/draft/2020-12/schema",
+ "$id": "https://dream-server.dev/schema/installer-sim-summary.v1.json",
+ "title": "Installer Simulation Summary v1",
+ "type": "object",
+ "required": ["version", "generated_at", "runs"],
+ "properties": {
+ "version": { "const": "1" },
+ "generated_at": { "type": "string" },
+ "runs": {
+ "type": "object",
+ "required": ["linux_dryrun", "macos_installer_mvp", "windows_scenario_preflight", "doctor_snapshot"],
+ "properties": {
+ "linux_dryrun": {
+ "type": "object",
+ "required": ["exit_code", "signals", "log"],
+ "properties": {
+ "exit_code": { "type": "integer" },
+ "signals": { "type": "object" },
+ "log": { "type": "string" }
+ },
+ "additionalProperties": true
+ },
+ "macos_installer_mvp": {
+ "type": "object",
+ "required": ["exit_code", "log"],
+ "properties": {
+ "exit_code": { "type": "integer" },
+ "log": { "type": "string" },
+ "preflight": { "type": ["object", "null"] },
+ "doctor": { "type": ["object", "null"] }
+ },
+ "additionalProperties": true
+ },
+ "windows_scenario_preflight": {
+ "type": "object",
+ "required": ["report"],
+ "properties": {
+ "report": { "type": ["object", "null"] }
+ },
+ "additionalProperties": true
+ },
+ "doctor_snapshot": {
+ "type": "object",
+ "required": ["exit_code", "report"],
+ "properties": {
+ "exit_code": { "type": "integer" },
+ "report": { "type": ["object", "null"] }
+ },
+ "additionalProperties": true
+ }
+ },
+ "additionalProperties": true
+ }
+ },
+ "additionalProperties": true
+}
diff --git a/dream-server/config/litellm/cloud-config.yaml b/dream-server/config/litellm/cloud-config.yaml
deleted file mode 100644
index eeefacd0e..000000000
--- a/dream-server/config/litellm/cloud-config.yaml
+++ /dev/null
@@ -1,55 +0,0 @@
-# LiteLLM Cloud Mode Configuration
-# Full cloud model access
-
-model_list:
- # Claude (Anthropic)
- - model_name: claude-sonnet
- litellm_params:
- model: claude-sonnet-4-5
- api_key: os.environ/ANTHROPIC_API_KEY
- model_info:
- description: "Claude Sonnet 4.5 - Best for coding and analysis"
-
- - model_name: claude-opus
- litellm_params:
- model: claude-opus-4
- api_key: os.environ/ANTHROPIC_API_KEY
- model_info:
- description: "Claude Opus 4 - Most capable, best reasoning"
-
- # OpenAI
- - model_name: gpt-4o
- litellm_params:
- model: gpt-4o
- api_key: os.environ/OPENAI_API_KEY
- model_info:
- description: "GPT-4o - Fast and capable"
-
- - model_name: gpt-4-turbo
- litellm_params:
- model: gpt-4-turbo-preview
- api_key: os.environ/OPENAI_API_KEY
- model_info:
- description: "GPT-4 Turbo - Latest GPT-4"
-
- # Together AI (open source models)
- - model_name: llama-3.1-70b
- litellm_params:
- model: together_ai/meta-llama/Llama-3.1-70B-Instruct-Turbo
- api_key: os.environ/TOGETHER_API_KEY
- model_info:
- description: "Llama 3.1 70B - Open source powerhouse"
-
- - model_name: qwen-72b
- litellm_params:
- model: together_ai/Qwen/Qwen2.5-72B-Instruct-Turbo
- api_key: os.environ/TOGETHER_API_KEY
- model_info:
- description: "Qwen 2.5 72B - Excellent for coding"
-
-litellm_settings:
- drop_params: true
- set_verbose: false
-
-general_settings:
- master_key: os.environ/LITELLM_MASTER_KEY
diff --git a/dream-server/config/litellm/cloud.yaml b/dream-server/config/litellm/cloud.yaml
new file mode 100644
index 000000000..053386011
--- /dev/null
+++ b/dream-server/config/litellm/cloud.yaml
@@ -0,0 +1,25 @@
+model_list:
+ - model_name: default
+ litellm_params:
+ model: anthropic/claude-sonnet-4-5-20250514
+ api_key: os.environ/ANTHROPIC_API_KEY
+
+ - model_name: gpt4o
+ litellm_params:
+ model: openai/gpt-4o
+ api_key: os.environ/OPENAI_API_KEY
+
+ - model_name: fast
+ litellm_params:
+ model: anthropic/claude-haiku-4-5-20251001
+ api_key: os.environ/ANTHROPIC_API_KEY
+
+router_settings:
+ routing_strategy: simple-shuffle
+
+general_settings:
+ master_key: os.environ/LITELLM_MASTER_KEY
+
+litellm_settings:
+ drop_params: true
+ set_verbose: false
diff --git a/dream-server/config/litellm/config.yaml b/dream-server/config/litellm/config.yaml
deleted file mode 100644
index 54f535277..000000000
--- a/dream-server/config/litellm/config.yaml
+++ /dev/null
@@ -1,36 +0,0 @@
-# LiteLLM Configuration
-# Use this when running multiple models or providers
-
-model_list:
- # Local vLLM model
- - model_name: local-qwen
- litellm_params:
- model: openai/Qwen/Qwen2.5-32B-Instruct-AWQ
- api_base: http://vllm:8000/v1
- api_key: ${VLLM_API_KEY:-}
- model_info:
- max_tokens: 8192
-
- # Example: Add OpenAI for comparison
- # - model_name: gpt-4o
- # litellm_params:
- # model: gpt-4o
- # api_key: ${OPENAI_API_KEY}
-
- # Example: Add Claude
- # - model_name: claude-sonnet
- # litellm_params:
- # model: claude-3-5-sonnet-20241022
- # api_key: ${ANTHROPIC_API_KEY}
-
-# General settings
-litellm_settings:
- drop_params: true
- set_verbose: false
- num_retries: 3
-
-# Router settings (for load balancing multiple backends)
-router_settings:
- routing_strategy: simple-shuffle
- model_group_alias:
- default: local-qwen
diff --git a/dream-server/config/litellm/hybrid-config.yaml b/dream-server/config/litellm/hybrid-config.yaml
deleted file mode 100644
index d14d18a54..000000000
--- a/dream-server/config/litellm/hybrid-config.yaml
+++ /dev/null
@@ -1,49 +0,0 @@
-# LiteLLM Hybrid Config โ Local Primary + Cloud Fallback
-# Mission: M1 (Fully Local OpenClaw) โ M5 (Clonable Dream Setup Server)
-
-model_list:
- # Local model (primary)
- - model_name: qwen2.5-32b-instruct-awq
- litellm_params:
- model: openai/qwen2.5-32b-instruct-awq
- api_base: http://localhost:8000/v1
- api_key: dummy
- tpm: 100000
- rpm: 1000
-
- # Cloud fallback (when local fails)
- - model_name: gpt-4o
- litellm_params:
- model: gpt-4o
- api_key: ${CLOUD_API_KEY}
- api_base: ${CLOUD_BASE_URL}
- tpm: 1000000
- rpm: 10000
-
- - model_name: claude-3-5-sonnet
- litellm_params:
- model: claude-3-5-sonnet-20241022
- api_key: ${CLOUD_API_KEY}
- api_base: ${CLOUD_BASE_URL}
- tpm: 1000000
- rpm: 10000
-
-litellm_settings:
- # Retry on failure (local โ cloud fallback)
- num_retries: 3
- request_timeout: 300
-
- # Fallback configuration
- fallback_models:
- - gpt-4o
- - claude-3-5-sonnet
-
- # Circuit breaker
- circuit_breaker:
- errors: 3
- timeout: 60
-
-general_settings:
- master_key: ${LITELLM_MASTER_KEY:?LITELLM_MASTER_KEY must be set}
- logs_dir: ./logs
- database_url: ./data/litellm.db
diff --git a/dream-server/config/litellm/hybrid.yaml b/dream-server/config/litellm/hybrid.yaml
new file mode 100644
index 000000000..d26cf91e8
--- /dev/null
+++ b/dream-server/config/litellm/hybrid.yaml
@@ -0,0 +1,25 @@
+model_list:
+ - model_name: default
+ litellm_params:
+ model: openai/default
+ api_base: http://llama-server:8080/v1
+ api_key: not-needed
+
+ - model_name: default
+ litellm_params:
+ model: anthropic/claude-sonnet-4-5-20250514
+ api_key: os.environ/ANTHROPIC_API_KEY
+
+router_settings:
+ routing_strategy: simple-shuffle
+ num_retries: 2
+ fallbacks:
+ - default:
+ - default
+
+general_settings:
+ master_key: os.environ/LITELLM_MASTER_KEY
+
+litellm_settings:
+ drop_params: true
+ set_verbose: false
diff --git a/dream-server/config/litellm/local.yaml b/dream-server/config/litellm/local.yaml
new file mode 100644
index 000000000..27a8c0212
--- /dev/null
+++ b/dream-server/config/litellm/local.yaml
@@ -0,0 +1,19 @@
+model_list:
+ - model_name: default
+ litellm_params:
+ model: openai/default
+ api_base: http://llama-server:8080/v1
+ api_key: not-needed
+
+ - model_name: "*"
+ litellm_params:
+ model: openai/*
+ api_base: http://llama-server:8080/v1
+ api_key: not-needed
+
+general_settings:
+ master_key: os.environ/LITELLM_MASTER_KEY
+
+litellm_settings:
+ drop_params: true
+ set_verbose: false
diff --git a/dream-server/config/litellm/offline-config.yaml b/dream-server/config/litellm/offline-config.yaml
deleted file mode 100644
index aaad53548..000000000
--- a/dream-server/config/litellm/offline-config.yaml
+++ /dev/null
@@ -1,36 +0,0 @@
-# LiteLLM Offline Mode Configuration
-# Local models only - no cloud access
-
-model_list:
- # Local vLLM
- - model_name: qwen-32b
- litellm_params:
- model: openai/Qwen/Qwen2.5-32B-Instruct-AWQ
- api_base: http://vllm:8000/v1
- api_key: not-needed
- model_info:
- description: "Local Qwen 32B via vLLM"
-
- # Local Ollama (CPU fallback)
- - model_name: qwen-cpu
- litellm_params:
- model: ollama/qwen2.5:32b
- api_base: http://ollama:11434
- model_info:
- description: "Local Qwen 32B via Ollama (CPU)"
-
- # Default route to vLLM
- - model_name: default
- litellm_params:
- model: openai/Qwen/Qwen2.5-32B-Instruct-AWQ
- api_base: http://vllm:8000/v1
- api_key: not-needed
- model_info:
- description: "Default to local vLLM"
-
-litellm_settings:
- drop_params: true
- set_verbose: false
-
-general_settings:
- master_key: os.environ/LITELLM_MASTER_KEY
diff --git a/dream-server/config/litellm/strix-halo-config.yaml b/dream-server/config/litellm/strix-halo-config.yaml
new file mode 100644
index 000000000..27a8c0212
--- /dev/null
+++ b/dream-server/config/litellm/strix-halo-config.yaml
@@ -0,0 +1,19 @@
+model_list:
+ - model_name: default
+ litellm_params:
+ model: openai/default
+ api_base: http://llama-server:8080/v1
+ api_key: not-needed
+
+ - model_name: "*"
+ litellm_params:
+ model: openai/*
+ api_base: http://llama-server:8080/v1
+ api_key: not-needed
+
+general_settings:
+ master_key: os.environ/LITELLM_MASTER_KEY
+
+litellm_settings:
+ drop_params: true
+ set_verbose: false
diff --git a/dream-server/config/livekit/Dockerfile b/dream-server/config/livekit/Dockerfile
deleted file mode 100644
index 530f762e6..000000000
--- a/dream-server/config/livekit/Dockerfile
+++ /dev/null
@@ -1,19 +0,0 @@
-# LiveKit Server with Environment Variable Support
-# Adds envsubst for runtime config generation
-
-FROM livekit/livekit-server:v1.9.11
-
-# Install envsubst (from gettext) โ livekit base image is Alpine
-USER root
-RUN apk add --no-cache gettext
-
-# Copy entrypoint script
-COPY livekit-entrypoint.sh /usr/local/bin/
-RUN chmod +x /usr/local/bin/livekit-entrypoint.sh
-
-# Use non-root user
-USER 1000:1000
-
-# Set entrypoint
-ENTRYPOINT ["/usr/local/bin/livekit-entrypoint.sh"]
-CMD ["--config", "/tmp/livekit.yaml"]
diff --git a/dream-server/config/livekit/livekit-entrypoint.sh b/dream-server/config/livekit/livekit-entrypoint.sh
deleted file mode 100755
index 2e10a8cf2..000000000
--- a/dream-server/config/livekit/livekit-entrypoint.sh
+++ /dev/null
@@ -1,34 +0,0 @@
-#!/bin/sh
-# livekit-entrypoint.sh
-# Substitutes environment variables in LiveKit config and starts server
-
-set -e
-
-CONFIG_TEMPLATE="/etc/livekit.yaml.template"
-CONFIG_OUTPUT="/tmp/livekit.yaml"
-
-# Check if template exists
-if [ -f "$CONFIG_TEMPLATE" ]; then
- echo "Generating LiveKit config from template..."
-
- # Check required env vars
- if [ -z "${LIVEKIT_API_KEY:-}" ]; then
- echo "ERROR: LIVEKIT_API_KEY environment variable is required"
- exit 1
- fi
-
- if [ -z "${LIVEKIT_API_SECRET:-}" ]; then
- echo "ERROR: LIVEKIT_API_SECRET environment variable is required"
- exit 1
- fi
-
- # Substitute environment variables
- envsubst < "$CONFIG_TEMPLATE" > "$CONFIG_OUTPUT"
- echo "LiveKit config generated successfully"
-else
- echo "ERROR: Config template not found at $CONFIG_TEMPLATE"
- exit 1
-fi
-
-# Execute the original LiveKit server command
-exec /livekit-server "$@"
diff --git a/dream-server/config/livekit/livekit.yaml b/dream-server/config/livekit/livekit.yaml
deleted file mode 100644
index 401e8498f..000000000
--- a/dream-server/config/livekit/livekit.yaml
+++ /dev/null
@@ -1,17 +0,0 @@
-port: 7880
-rtc:
- port_range_start: 50000
- port_range_end: 60000
- use_external_ip: true
- # node_ip removed - let LiveKit auto-detect
-
-keys:
- ${LIVEKIT_API_KEY}: ${LIVEKIT_API_SECRET}
-
-logging:
- level: info
- json: false
-
-room:
- empty_timeout: 300
- max_participants: 10
diff --git a/dream-server/config/livekit/offline-livekit.yaml b/dream-server/config/livekit/offline-livekit.yaml
deleted file mode 100644
index ea5e03b93..000000000
--- a/dream-server/config/livekit/offline-livekit.yaml
+++ /dev/null
@@ -1,112 +0,0 @@
-# LiveKit Offline Configuration
-# Local-only WebRTC setup for Dream Server zero-cloud mode
-# M1 Phase 2 - No external dependencies
-
-port: 7880
-
-# RTC Configuration - Local network only
-rtc:
- # Port range for WebRTC (ensure these are open on firewall)
- port_range_start: 50000
- port_range_end: 60000
-
- # OFFLINE MODE: Force local network usage
- use_external_ip: false
-
- # Use container hostname for local networking
- node_ip: "0.0.0.0"
-
- # UDP configuration for local network
- udp_port: 7882
-
- # STUN/TURN servers - DISABLED for offline mode
- # stun_servers: []
- # turn_servers: []
-
-# Authentication keys - populated from environment variables
-keys:
- ${LIVEKIT_API_KEY}: ${LIVEKIT_API_SECRET}
- # OFFLINE MODE: No webhook validation needed
- # webhooks: []
-
-# Logging configuration
-logging:
- level: info
- json: false
- # OFFLINE MODE: Log to stdout only
- sample: 100
-
-# Room configuration
-room:
- # Timeout for empty rooms (5 minutes)
- empty_timeout: 300
-
- # Max participants per room
- max_participants: 10
-
- # OFFLINE MODE: Disable external integrations
- # webhooks: []
-
- # Enable recording (local storage only)
- enabled_codecs:
- - mime: audio/opus
- - mime: video/VP8
- - mime: video/VP9
- - mime: video/H264
-
-# Node configuration
-node_selector:
- # OFFLINE MODE: Single node setup
- kind: any
-
-# Signal relay configuration
-signal_relay:
- # OFFLINE MODE: Disabled for local deployment
- enabled: false
-
-# Limits and security
-limits:
- # Max bitrate per participant (1.5 Mbps)
- max_bitrate: 1500000
-
- # Max packet size
- max_packet_size: 1200
-
- # OFFLINE MODE: No rate limiting for local use
- # rate_limit: 100
-
-# Development settings
-debug:
- # Enable detailed logging for troubleshooting
- pprof: false
-
-# Prometheus metrics (optional)
-prometheus:
- # OFFLINE MODE: Disable metrics export
- port: 0
-
-# Key provider configuration
-key_provider:
- # Use static keys from environment variables
- kind: static
- static:
- keys:
- ${LIVEKIT_API_KEY}: ${LIVEKIT_API_SECRET}
-
-# Region configuration - single region for offline
-region:
- # Local deployment
- current: "local"
- regions:
- - local
-
-# TURN configuration - DISABLED for offline mode
-turn:
- enabled: false
- # No external TURN servers
-
-# Webhooks - DISABLED for offline mode
-webhook:
- # No external webhooks
- urls: []
- api_key: ""
\ No newline at end of file
diff --git a/dream-server/config/llama-server/models.ini b/dream-server/config/llama-server/models.ini
new file mode 100644
index 000000000..1b4879f0b
--- /dev/null
+++ b/dream-server/config/llama-server/models.ini
@@ -0,0 +1,4 @@
+[qwen3-8b]
+filename = Qwen3-8B-Q4_K_M.gguf
+load-on-startup = true
+n-ctx = 32768
diff --git a/dream-server/config/openclaw/entry.json b/dream-server/config/openclaw/entry.json
deleted file mode 100644
index 0ad727623..000000000
--- a/dream-server/config/openclaw/entry.json
+++ /dev/null
@@ -1,44 +0,0 @@
-{
- "$schema": "https://raw.githubusercontent.com/openclaw/openclaw/main/schemas/openclaw.json",
- "version": "1.0",
- "agent": {
- "name": "Dream Agent",
- "model": "local-vllm/Qwen/Qwen2.5-1.5B-Instruct",
- "systemPrompt": "You are Dream Agent, a local AI assistant running on this machine's GPU. You cost nothing per token โ no API keys, no cloud, no data leaving this network. Be helpful, accurate, and respect privacy. You have access to tools for reading files, writing files, and running commands. Use them proactively โ don't give the user homework you can do yourself."
- },
- "providers": {
- "local-vllm": {
- "type": "openai-compatible",
- "baseUrl": "http://vllm-tool-proxy:8003/v1",
- "apiKey": "none",
- "models": {
- "Qwen/Qwen2.5-1.5B-Instruct": {
- "contextWindow": 8192,
- "supportsTools": true
- }
- }
- }
- },
- "subagent": {
- "enabled": true,
- "model": "local-vllm/Qwen/Qwen2.5-1.5B-Instruct",
- "maxConcurrent": 8,
- "timeoutSeconds": 240
- },
- "tools": {
- "exec": {
- "enabled": true,
- "allowedCommands": ["ls", "cat", "grep", "find", "head", "tail", "wc"]
- },
- "read": { "enabled": true },
- "write": { "enabled": true },
- "web_fetch": { "enabled": true }
- },
- "gateway": {
- "port": 7860,
- "host": "0.0.0.0",
- "auth": {
- "mode": "none"
- }
- }
-}
diff --git a/dream-server/config/openclaw/inject-token.js b/dream-server/config/openclaw/inject-token.js
index 62749db40..d8cd8223e 100644
--- a/dream-server/config/openclaw/inject-token.js
+++ b/dream-server/config/openclaw/inject-token.js
@@ -1,40 +1,135 @@
// Inject gateway auth token into Control UI so it auto-connects
// Runs at container startup before the gateway starts
//
+// Three tasks:
+// 1. Patch the runtime config (origins, flags, auth, model names)
+// 2. Inject auto-token.js into the Control UI HTML (CSP-compliant)
+// 3. Fix model references to match what llama-server actually serves
+//
// IMPORTANT: The gateway sets Content-Security-Policy: script-src 'self'
// which blocks inline scripts. So we must create an EXTERNAL .js file
// and reference it via ');
- fs.writeFileSync(htmlPath, html);
-
- console.log('[inject-token] Created auto-token.js and injected ');
+ fs.writeFileSync(HTML_PATH, html);
+
+ console.log('[inject-token] created auto-token.js and injected
+
+
+
+
+
+
+
Token Spy โ API Monitor
+
Real-time token usage, cost tracking & session control