Skip to content

Latest commit

 

History

History
293 lines (209 loc) · 8.07 KB

File metadata and controls

293 lines (209 loc) · 8.07 KB

Model Management

Dream Server runs local language models as GGUF files from data/models/. The recommended path is the Dashboard Models page. Manual model swaps are also available for headless maintenance and advanced operator workflows.

Recommended: Dashboard Models Page

Open the Dashboard and go to Models.

From there you can:

  • See the curated Dream Server model catalog.
  • Check approximate model size, VRAM requirement, context length, and specialty.
  • Download a catalog model into data/models/.
  • Load a downloaded model.
  • Load a manually copied single-file GGUF discovered in data/models/.
  • Delete a downloaded catalog model.

When a catalog model is loaded, Dream Server updates the active GGUF settings and restarts the local inference service so OpenAI-compatible clients use the new model. After the switch settles, verify it from the host:

dream model current
curl http://localhost:11434/v1/models

On macOS native Metal and Windows native/Lemonade installs, use http://localhost:8080/v1/models unless you changed the port.

Downstream apps that talk directly to llama-server or LiteLLM pick up the active model through those services. Examples include Open WebUI, Token Spy, OpenCode, and OpenAI-compatible SDK clients configured against Dream Server. Perplexica also stores a persisted defaultChatModel; installer first boot and bootstrap hot-swap update it automatically, but after a manual model change you should verify Perplexica settings or run scripts/repair/repair-perplexica.sh.

Hermes Agent keeps its own model name in data/hermes/config.yaml. If Hermes is enabled after a model switch, verify the model.default line:

grep -n "default:" data/hermes/config.yaml
docker restart dream-hermes

For Lemonade/AMD backends, Hermes and LiteLLM may need the model name in the form extra.<GGUF_FILE>.

Where Models Live

Default model directory:

~/dream-server/data/models/

On Windows installs:

$env:USERPROFILE\dream-server\data\models\

Each model is normally a single .gguf file:

ls -lh ~/dream-server/data/models/*.gguf

The active model is recorded in .env:

grep -E "^(LLM_MODEL|GGUF_FILE|CTX_SIZE|MAX_CONTEXT)=" ~/dream-server/.env

GGUF_FILE is the filename Dream Server should load from data/models/. LLM_MODEL is the friendly logical model name used by scripts and config. CTX_SIZE and MAX_CONTEXT control context length.

Hermes requires at least a 64K context window. Installer bootstrap mode uses 65536 for the fast-start model, then switches .env, llama-server, and Hermes config to the full model context, usually 131072, when the background download completes.

Manual: Download a Catalog Model

For most users, use the Dashboard. If you are debugging a failed download or preloading a machine, download the exact catalog GGUF URL from config/model-library.json into data/models/.

Example:

cd ~/dream-server
mkdir -p data/models

curl -L \
  -o data/models/Qwen3.5-9B-Q4_K_M.gguf \
  https://huggingface.co/unsloth/Qwen3.5-9B-GGUF/resolve/main/Qwen3.5-9B-Q4_K_M.gguf

Then open Dashboard -> Models. If the filename matches a catalog entry, the model should appear as downloaded and you can load it from the Dashboard.

Bring Your Own GGUF

For a single local .gguf, the normal flow is:

  1. Copy the file into data/models/.
  2. Open Dashboard -> Models.
  3. Load the local entry.

The Dashboard updates .env, config/llama-server/models.ini, and the active runtime routing before restarting the inference service.

On Lemonade installs, loading a model directly inside the Lemonade app only changes Lemonade's current runtime state. It does not update Dream Server's .env or LiteLLM routing. Open WebUI talks through Dream Server/LiteLLM, so its next chat can ask for the persisted Dream Server model and Lemonade may unload the model you opened manually. Use Dashboard -> Models -> Load when you want Open WebUI and other Dream Server clients to keep using the local GGUF.

Use the manual procedure below only if you cannot access the Dashboard or need to repair an install by hand.

  1. Download the GGUF into data/models/.
cd ~/dream-server
mkdir -p data/models
cp /path/to/MyModel-Q4_K_M.gguf data/models/
  1. Update .env.
dream config edit

Set:

LLM_MODEL=my-model
GGUF_FILE=MyModel-Q4_K_M.gguf
CTX_SIZE=8192
MAX_CONTEXT=8192
  1. Update config/llama-server/models.ini.
[my-model]
filename = MyModel-Q4_K_M.gguf
load-on-startup = true
n-ctx = 8192
  1. If Hermes is enabled, update data/hermes/config.yaml.
model:
  default: "MyModel-Q4_K_M.gguf"
  context_length: 65536

For Lemonade/AMD backends, use:

model:
  default: "extra.MyModel-Q4_K_M.gguf"
  context_length: 65536

Also keep auxiliary.compression.context_length at the same value and use compression.threshold: 0.50; older absolute-token thresholds can leave Hermes waiting too long to compact.

  1. For AMD/Lemonade installs, verify config/litellm/lemonade.yaml.

Each local model alias should use the extra.<GGUF_FILE> form and should keep Qwen3 thinking disabled for clients that do not pass that flag themselves:

extra_body:
  chat_template_kwargs:
    enable_thinking: false
  1. If Perplexica is enabled, reseed or verify its model setting.
LLM_MODEL="$(grep -E '^LLM_MODEL=' .env | tail -n1 | cut -d= -f2 | tr -d '"')"
PERPLEXICA_PORT="$(grep -E '^PERPLEXICA_PORT=' .env | tail -n1 | cut -d= -f2 | tr -d '"')"
scripts/repair/repair-perplexica.sh "http://127.0.0.1:${PERPLEXICA_PORT:-3004}" "$LLM_MODEL"

Bootstrap hot-swap handles this automatically. Manual GGUF edits and some operator-driven switches should still be verified because Perplexica stores its own app settings in its volume.

  1. Restart the affected services.
dream restart llama-server
dream restart litellm
docker restart dream-hermes 2>/dev/null || true

If your install uses direct Docker Compose commands instead of the dream CLI, recreate llama-server so it rereads .env.

Verify a Switch

Use these checks after Dashboard or manual model changes:

dream model current
curl http://localhost:11434/v1/models

For LiteLLM installs that require an API key, use the key from .env:

LITELLM_KEY=$(grep '^LITELLM_KEY=' .env | cut -d= -f2-)
curl -H "Authorization: Bearer $LITELLM_KEY" http://localhost:4000/v1/models

From inside a Docker container, the inference endpoint is:

http://llama-server:8080/v1

Troubleshooting

The download finished, but the model is not visible

Check the file is present and non-empty:

ls -lh data/models/*.gguf

If it is a catalog model, confirm the filename exactly matches config/model-library.json. The Dashboard only marks catalog models as downloaded when the on-disk filename matches the catalog entry.

The model file exists, but loading fails

Check service logs:

dream logs llm

Common causes:

  • The model needs more VRAM or unified memory than the machine has.
  • Context length is too high; lower CTX_SIZE / MAX_CONTEXT.
  • The GGUF is not compatible with the active backend.
  • On AMD/Lemonade, a service is still asking for the raw filename instead of extra.<GGUF_FILE>.

Open WebUI or another app still shows the old model

Verify the server first:

curl http://localhost:11434/v1/models

If the server is correct, refresh the app. If the server is wrong, restart llama-server and verify .env / models.ini.

Hermes still asks for the old model

Hermes has its own config:

grep -n "default:\|context_length:" data/hermes/config.yaml
docker restart dream-hermes

For AMD/Lemonade, use extra.<GGUF_FILE>.

Current Limitations

  • Dashboard model download and load are catalog-based.
  • Custom GGUF import from a local file or arbitrary URL is not yet a first-class Dashboard workflow.
  • dream model swap switches Dream Server tiers, not arbitrary GGUF files.
  • scripts/upgrade-model.sh is a legacy helper for model-directory layouts and should not be used as the primary GGUF switch path on current installs.