LLM inference, chat UI, voice agents, workflow automation, RAG, image generation, and privacy tools — all running on your hardware. No cloud. No subscriptions. No configuration.
New here? Read the Friendly Guide or listen to the audio version — a complete walkthrough of what Dream Server is, how it works, and how to make it your own. No technical background needed.
Platform Support — March 2026
Platform Status Linux (NVIDIA + AMD) Supported — install and run today Windows (NVIDIA + AMD) Supported — install and run today macOS (Apple Silicon) Supported — install and run today Tested Linux distros: Ubuntu 24.04/22.04, Debian 12, Fedora 41+, Arch Linux, CachyOS, openSUSE Tumbleweed. Other distros using apt, dnf, pacman, or zypper should also work — open an issue if yours doesn't.
Windows: Requires Docker Desktop with WSL2 backend. NVIDIA GPUs use Docker GPU passthrough; AMD Strix Halo runs llama-server natively with Vulkan.
macOS: Requires Apple Silicon (M1+) and Docker Desktop. llama-server runs natively with Metal GPU acceleration; all other services run in Docker.
See the Support Matrix for details.
Setting up local AI usually means stitching together a dozen projects, debugging CUDA drivers, writing Docker configs, and hoping everything talks to each other. Dream Server replaces all of that with a single installer.
- Run one command — the installer detects your GPU, picks the right model for your hardware, generates secure credentials, and launches everything
- Chat in under 2 minutes — bootstrap mode starts a small model instantly while your full model downloads in the background
- 13 integrated services — chat, agents, voice, workflows, search, RAG, image generation, and more, all pre-wired and working together
- Fully moddable — drop in a folder, run
dream enable, done. Every service is an extension
curl -fsSL https://raw.githubusercontent.com/Light-Heart-Labs/DreamServer/main/dream-server/get-dream-server.sh | bashOpen http://localhost:3000 and start chatting.
No GPU? Dream Server also runs in cloud mode — same full stack, powered by OpenAI/Anthropic/Together APIs instead of local inference:
./install.sh --cloud
Port conflicts? Every port is configurable via environment variables. See
.env.examplefor the full list, or override at install time:WEBUI_PORT=9090 ./install.sh
Manual install (Linux)
git clone https://github.com/Light-Heart-Labs/DreamServer.git
cd DreamServer/dream-server
./install.shWindows (PowerShell)
Requires Docker Desktop with WSL2 backend enabled. Install Docker Desktop first and make sure it is running before you start.
git clone https://github.com/Light-Heart-Labs/DreamServer.git
cd DreamServer
.\install.ps1The installer detects your GPU, picks the right model, generates credentials, starts all services, and creates a Desktop shortcut to the Dashboard. Manage with .\dream-server\installers\windows\dream.ps1 status.
macOS (Apple Silicon)
Requires Apple Silicon (M1+) and Docker Desktop. Install Docker Desktop first and make sure it is running before you start.
git clone https://github.com/Light-Heart-Labs/DreamServer.git
cd DreamServer/dream-server
./install.shThe installer detects your chip, picks the right model for your unified memory, launches llama-server natively with Metal acceleration, and starts all other services in Docker. Manage with ./dream-macos.sh status.
See the macOS Quickstart for details.
- Open WebUI — full-featured chat interface with conversation history, web search, document upload, and 30+ languages
- llama-server — high-performance LLM inference with continuous batching, auto-selected for your GPU
- LiteLLM — API gateway supporting local/cloud/hybrid modes
- Whisper — speech-to-text
- Kokoro — text-to-speech
- OpenClaw — autonomous AI agent framework
- n8n — workflow automation with 400+ integrations (Slack, email, databases, APIs)
- Qdrant — vector database for retrieval-augmented generation (RAG)
- SearXNG — self-hosted web search (no tracking)
- Perplexica — deep research engine
- ComfyUI — node-based image generation
- Privacy Shield — PII scrubbing proxy for API calls
- Dashboard — real-time GPU metrics, service health, model management
The installer detects your GPU and picks the optimal model automatically. No manual configuration.
| VRAM | Model | Example GPUs |
|---|---|---|
| 8–11 GB | Qwen 2.5 7B (Q4_K_M) | RTX 4060 Ti, RTX 3060 12GB |
| 12–20 GB | Qwen 2.5 14B (Q4_K_M) | RTX 3090, RTX 4080 |
| 20–40 GB | Qwen 2.5 32B (Q4_K_M) | RTX 4090, A6000 |
| 40+ GB | Qwen 2.5 72B (Q4_K_M) | A100, multi-GPU |
| 90+ GB | Qwen3 Coder Next 80B MoE | Multi-GPU A100/H100 |
| Unified RAM | Model | Hardware |
|---|---|---|
| 64–89 GB | Qwen3 30B-A3B (30B MoE) | Ryzen AI MAX+ 395 (64GB) |
| 90+ GB | Qwen3 Coder Next (80B MoE) | Ryzen AI MAX+ 395 (96GB) |
| Unified RAM | Model | Example Hardware |
|---|---|---|
| 8–24 GB | Qwen3 4B (Q4_K_M) | M1/M2 base, M4 Mac Mini (16GB) |
| 32 GB | Qwen3 8B (Q4_K_M) | M4 Pro Mac Mini, M3 Max MacBook Pro |
| 48 GB | Qwen3 30B-A3B (MoE, Q4_K_M) | M4 Pro (48GB), M2 Max (48GB) |
| 64+ GB | Qwen3 30B-A3B (MoE, Q4_K_M) | M2 Ultra Mac Studio, M4 Max (64GB+) |
Override tier selection: ./install.sh --tier 3
No waiting for large downloads. Dream Server uses bootstrap mode by default:
- Downloads a tiny 1.5B model in under a minute
- You start chatting immediately
- The full model downloads in the background
- Hot-swap to the full model when it's ready — zero downtime
The installer pulls all services in parallel. Downloads are resume-capable — interrupted downloads pick up where they left off.
Skip bootstrap: ./install.sh --no-bootstrap
The installer picks a model for your hardware, but you can switch anytime:
dream model current # What's running now?
dream model list # Show all available tiers
dream model swap T3 # Switch to a different tierIf the new model isn't downloaded yet, pre-fetch it first:
./scripts/pre-download.sh --tier 3 # Download before switching
dream model swap T3 # Then swap (restarts llama-server)Already have a GGUF you want to use? Drop it in data/models/, update GGUF_FILE and LLM_MODEL in .env, and restart:
docker compose restart llama-serverRollback is automatic — if a new model fails to load, Dream Server reverts to your previous model.
Dream Server is designed to be modded. Every service is an extension — a folder with a manifest.yaml and a compose.yaml. The dashboard, CLI, health checks, and compose stack all discover extensions automatically.
extensions/services/
my-service/
manifest.yaml # Metadata: name, port, health endpoint, GPU backends
compose.yaml # Docker Compose fragment (auto-merged into the stack)
dream enable my-service # Enable it
dream disable my-service # Disable it
dream list # See everythingThe installer itself is modular — 6 libraries and 13 phases, each in its own file. Want to add a hardware tier, swap a default model, or skip a phase? Edit one file.
Full extension guide | Installer architecture
The dream CLI manages your entire stack:
dream status # Health checks + GPU status
dream list # All services and their state
dream logs llm # Tail logs (aliases: llm, stt, tts)
dream restart [service] # Restart one or all services
dream start / stop # Start or stop the stack
dream mode cloud # Switch to cloud APIs via LiteLLM
dream mode local # Switch back to local inference
dream mode hybrid # Local primary, cloud fallback
dream model swap T3 # Switch to a different hardware tier
dream enable n8n # Enable an extension
dream disable whisper # Disable one
dream config show # View .env (secrets masked)
dream preset save gaming # Snapshot current config
dream preset load gaming # Restore it| Dream Server | Ollama + Open WebUI | LocalAI | |
|---|---|---|---|
| One-command full-stack install | LLM + agents + workflows + RAG + voice + images | LLM + chat only | LLM only |
| Hardware auto-detect + model selection | NVIDIA + AMD Strix Halo | No | No |
| AMD APU unified memory support | ROCm + llama-server | Partial (Vulkan) | No |
| Autonomous AI agents | OpenClaw | No | No |
| Workflow automation | n8n (400+ integrations) | No | No |
| Voice (STT + TTS) | Whisper + Kokoro | No | No |
| Image generation | ComfyUI | No | No |
| RAG pipeline | Qdrant + embeddings | No | No |
| Extension system | Manifest-based, hot-pluggable | No | No |
| Multi-GPU | Yes (NVIDIA) | Partial | Partial |
| Quickstart | Step-by-step install guide with troubleshooting |
| Hardware Guide | What to buy, tier recommendations |
| FAQ | Common questions and configuration |
| Extensions | How to add custom services |
| Installer Architecture | Modular installer deep dive |
| Changelog | Version history and release notes |
| Contributing | How to contribute |
Dream Server exists because of the incredible people, projects, and communities that make open-source AI possible. We are grateful to every contributor, maintainer, and tinkerer whose work powers this stack.
Thanks to kyuz0 for amd-strix-halo-toolboxes — pre-built ROCm containers for Strix Halo that saved us a lot of pain from having to build our own. And to lhl for strix-halo-testing — the foundational Strix Halo AI research and rocWMMA performance work that the broader community builds on.
- llama.cpp (ggerganov) — LLM inference engine
- Qwen (Alibaba Cloud) — Default language models
- Open WebUI — Chat interface
- ComfyUI — Image generation engine
- FLUX.1 (Black Forest Labs) — Image generation model
- AMD ROCm — GPU compute platform
- AMD Strix Halo Toolboxes (kyuz0) — Pre-built ROCm containers for AMD inference
- Strix Halo Testing (lhl) — Foundational Strix Halo AI research and rocWMMA optimizations
- n8n — Workflow automation
- Qdrant — Vector database
- SearXNG — Privacy-respecting search
- Perplexica — AI-powered search
- LiteLLM — LLM API gateway
- Kokoro FastAPI (remsky) — Text-to-speech
- Speaches — Speech-to-text
- Strix Halo Home Lab — Community knowledge base
If we missed anyone, open an issue. We want to get this right.
Apache 2.0 — Use it, modify it, ship it. See LICENSE.
Built by Light Heart Labs and The Collective


