Dream Server runs local language models as GGUF files from data/models/.
The recommended path is the Dashboard Models page. Manual model swaps are also
available for headless maintenance and advanced operator workflows.
Open the Dashboard and go to Models.
From there you can:
- See the curated Dream Server model catalog.
- Check approximate model size, VRAM requirement, context length, and specialty.
- Download a catalog model into
data/models/. - Load a downloaded model.
- Load a manually copied single-file GGUF discovered in
data/models/. - Delete a downloaded catalog model.
When a catalog model is loaded, Dream Server updates the active GGUF settings and restarts the local inference service so OpenAI-compatible clients use the new model. After the switch settles, verify it from the host:
dream model current
curl http://localhost:11434/v1/modelsOn macOS native Metal and Windows native/Lemonade installs, use
http://localhost:8080/v1/models unless you changed the port.
Downstream apps that talk directly to llama-server or LiteLLM pick up the
active model through those services. Examples include Open WebUI, Token Spy,
OpenCode, and OpenAI-compatible SDK clients configured against Dream Server.
Perplexica also stores a persisted defaultChatModel; installer first boot and
bootstrap hot-swap update it automatically, but after a manual model change you
should verify Perplexica settings or run scripts/repair/repair-perplexica.sh.
Hermes Agent keeps its own model name in data/hermes/config.yaml. If Hermes is
enabled after a model switch, verify the model.default line:
grep -n "default:" data/hermes/config.yaml
docker restart dream-hermesFor Lemonade/AMD backends, Hermes and LiteLLM may need the model name in the
form extra.<GGUF_FILE>.
Default model directory:
~/dream-server/data/models/On Windows installs:
$env:USERPROFILE\dream-server\data\models\Each model is normally a single .gguf file:
ls -lh ~/dream-server/data/models/*.ggufThe active model is recorded in .env:
grep -E "^(LLM_MODEL|GGUF_FILE|CTX_SIZE|MAX_CONTEXT)=" ~/dream-server/.envGGUF_FILE is the filename Dream Server should load from data/models/.
LLM_MODEL is the friendly logical model name used by scripts and config.
CTX_SIZE and MAX_CONTEXT control context length.
Hermes requires at least a 64K context window. Installer bootstrap mode uses
65536 for the fast-start model, then switches .env, llama-server, and
Hermes config to the full model context, usually 131072, when the background
download completes.
For most users, use the Dashboard. If you are debugging a failed download or
preloading a machine, download the exact catalog GGUF URL from
config/model-library.json into data/models/.
Example:
cd ~/dream-server
mkdir -p data/models
curl -L \
-o data/models/Qwen3.5-9B-Q4_K_M.gguf \
https://huggingface.co/unsloth/Qwen3.5-9B-GGUF/resolve/main/Qwen3.5-9B-Q4_K_M.ggufThen open Dashboard -> Models. If the filename matches a catalog entry, the model should appear as downloaded and you can load it from the Dashboard.
For a single local .gguf, the normal flow is:
- Copy the file into
data/models/. - Open Dashboard -> Models.
- Load the local entry.
The Dashboard updates .env, config/llama-server/models.ini, and the active
runtime routing before restarting the inference service.
On Lemonade installs, loading a model directly inside the Lemonade app only
changes Lemonade's current runtime state. It does not update Dream Server's
.env or LiteLLM routing. Open WebUI talks through Dream Server/LiteLLM, so
its next chat can ask for the persisted Dream Server model and Lemonade may
unload the model you opened manually. Use Dashboard -> Models -> Load when you
want Open WebUI and other Dream Server clients to keep using the local GGUF.
Use the manual procedure below only if you cannot access the Dashboard or need to repair an install by hand.
- Download the GGUF into
data/models/.
cd ~/dream-server
mkdir -p data/models
cp /path/to/MyModel-Q4_K_M.gguf data/models/- Update
.env.
dream config editSet:
LLM_MODEL=my-model
GGUF_FILE=MyModel-Q4_K_M.gguf
CTX_SIZE=8192
MAX_CONTEXT=8192- Update
config/llama-server/models.ini.
[my-model]
filename = MyModel-Q4_K_M.gguf
load-on-startup = true
n-ctx = 8192- If Hermes is enabled, update
data/hermes/config.yaml.
model:
default: "MyModel-Q4_K_M.gguf"
context_length: 65536For Lemonade/AMD backends, use:
model:
default: "extra.MyModel-Q4_K_M.gguf"
context_length: 65536Also keep auxiliary.compression.context_length at the same value and use
compression.threshold: 0.50; older absolute-token thresholds can leave Hermes
waiting too long to compact.
- For AMD/Lemonade installs, verify
config/litellm/lemonade.yaml.
Each local model alias should use the extra.<GGUF_FILE> form and should keep
Qwen3 thinking disabled for clients that do not pass that flag themselves:
extra_body:
chat_template_kwargs:
enable_thinking: false- If Perplexica is enabled, reseed or verify its model setting.
LLM_MODEL="$(grep -E '^LLM_MODEL=' .env | tail -n1 | cut -d= -f2 | tr -d '"')"
PERPLEXICA_PORT="$(grep -E '^PERPLEXICA_PORT=' .env | tail -n1 | cut -d= -f2 | tr -d '"')"
scripts/repair/repair-perplexica.sh "http://127.0.0.1:${PERPLEXICA_PORT:-3004}" "$LLM_MODEL"Bootstrap hot-swap handles this automatically. Manual GGUF edits and some operator-driven switches should still be verified because Perplexica stores its own app settings in its volume.
- Restart the affected services.
dream restart llama-server
dream restart litellm
docker restart dream-hermes 2>/dev/null || trueIf your install uses direct Docker Compose commands instead of the dream CLI,
recreate llama-server so it rereads .env.
Use these checks after Dashboard or manual model changes:
dream model current
curl http://localhost:11434/v1/modelsFor LiteLLM installs that require an API key, use the key from .env:
LITELLM_KEY=$(grep '^LITELLM_KEY=' .env | cut -d= -f2-)
curl -H "Authorization: Bearer $LITELLM_KEY" http://localhost:4000/v1/modelsFrom inside a Docker container, the inference endpoint is:
http://llama-server:8080/v1
Check the file is present and non-empty:
ls -lh data/models/*.ggufIf it is a catalog model, confirm the filename exactly matches
config/model-library.json. The Dashboard only marks catalog models as
downloaded when the on-disk filename matches the catalog entry.
Check service logs:
dream logs llmCommon causes:
- The model needs more VRAM or unified memory than the machine has.
- Context length is too high; lower
CTX_SIZE/MAX_CONTEXT. - The GGUF is not compatible with the active backend.
- On AMD/Lemonade, a service is still asking for the raw filename instead of
extra.<GGUF_FILE>.
Verify the server first:
curl http://localhost:11434/v1/modelsIf the server is correct, refresh the app. If the server is wrong, restart
llama-server and verify .env / models.ini.
Hermes has its own config:
grep -n "default:\|context_length:" data/hermes/config.yaml
docker restart dream-hermesFor AMD/Lemonade, use extra.<GGUF_FILE>.
- Dashboard model download and load are catalog-based.
- Custom GGUF import from a local file or arbitrary URL is not yet a first-class Dashboard workflow.
dream model swapswitches Dream Server tiers, not arbitrary GGUF files.scripts/upgrade-model.shis a legacy helper for model-directory layouts and should not be used as the primary GGUF switch path on current installs.