Skip to content

fix(compose): drop CPU overlay AUDIO_STT_MODEL literal; make AMD memory limit env-driven#1067

Merged
Lightheartdevs merged 1 commit intoLight-Heart-Labs:mainfrom
yasinBursali:fix/cpu-amd-overlay-env-driven-limits
May 2, 2026
Merged

fix(compose): drop CPU overlay AUDIO_STT_MODEL literal; make AMD memory limit env-driven#1067
Lightheartdevs merged 1 commit intoLight-Heart-Labs:mainfrom
yasinBursali:fix/cpu-amd-overlay-env-driven-limits

Conversation

@yasinBursali
Copy link
Copy Markdown
Contributor

What

Two related compose-overlay drift fixes:

  • dream-server/docker-compose.cpu.yml: drop the now-empty open-webui: block (was overriding AUDIO_STT_MODEL to turbo)
  • dream-server/docker-compose.amd.yml: replace literal memory: 110G with ${LLAMA_SERVER_MEMORY_LIMIT:-110G}

Why

CPU overlay: PR #985 made AUDIO_STT_MODEL env-driven (${AUDIO_STT_MODEL:-Systran/faster-whisper-base} in docker-compose.base.yml). The cpu overlay's residual literal AUDIO_STT_MODEL: "deepdml/faster-whisper-large-v3-turbo-ct2" overrode that interpolation, so on CPU-only Linux installs:

  1. Phase 06 of the installer correctly writes AUDIO_STT_MODEL=Systran/faster-whisper-base to .env
  2. Phase 12 pre-downloads only the base model
  3. Open WebUI's container env is forced to turbo by the overlay literal
  4. Whisper 404s on every transcription

The NVIDIA overlay's identical literal is intentional — for NVIDIA, Phase 06 also picks turbo, so the literal is a redundant safety net (per #985's design). Only the cpu overlay drifted.

AMD overlay: 5 of 6 GPU overlays use ${LLAMA_SERVER_MEMORY_LIMIT:-N} (NVIDIA 64G, Apple 32G, Intel/Arc 24G, CPU 6G). .env.schema.json registers the variable as a tunable string. The amd overlay's literal memory: 110G made AMD the only platform where users couldn't tune llama-server's memory cap.

tier0.yml's literal memory: 4G is a deliberate hard cap for <8 GB RAM hosts and was left alone.

How

  • cpu.yml: removed the entire open-webui: block (3 + leading-blank lines). Container inherits the base's env-var interpolation.
  • amd.yml: 1-line substitution. The 110G default is preserved as the AMD-specific fallback so existing AMD installs without the env var keep current byte-for-byte behavior.

Testing

  • YAML parse, make lint, docker compose config --quiet, pre-commit: all PASS
  • Functional substitution checks:
    • cpu env unset → resolves Systran/faster-whisper-base
    • cpu env override → propagates correctly
    • amd env unset → resolves 110G (fallback preserved)
    • amd env override LLAMA_SERVER_MEMORY_LIMIT=32G → resolves 32G
  • Cross-overlay non-regression sweep (nvidia, intel, arc, apple, multigpu, tier0): all PASS

Manual:

  • Linux CPU: docker exec dream-open-webui env | grep AUDIO_STT_MODEL should equal Systran/faster-whisper-base post-install
  • Linux AMD: set LLAMA_SERVER_MEMORY_LIMIT=32G in .env, dream restart, then docker inspect dream-llama-server --format '{{.HostConfig.Memory}}' should equal 34359738368 (32 GiB) — was 118111600640 (110 GiB) before

Review

Independent verification confirmed both bugs are real and isolated to these two overlays; cross-overlay scan found no other drift on AUDIO_STT_MODEL or LLAMA_SERVER_MEMORY_LIMIT.

Known Considerations

  • Schema default LLAMA_SERVER_MEMORY_LIMIT="64G" and AMD's compose fallback :-110G differ. This matches the codebase's existing pattern: schema default applies to installer env-generation; per-backend overlay fallback applies at compose-time when .env is silent. Same divergence already exists for nvidia/apple/intel/arc/cpu — not a regression.
  • A potential follow-up (separate PR, not this one) would have installer phases write LLAMA_SERVER_MEMORY_LIMIT to .env per-backend so the schema default becomes load-bearing. That's a design decision, not a bug.

Platform Impact

  • Linux CPU: STT now works on first install (was 404 on every transcription).
  • Linux AMD: LLAMA_SERVER_MEMORY_LIMIT env var now honored (was silently ignored).
  • Linux NVIDIA / Intel / Arc / multigpu / tier0: no behavior change.
  • macOS Apple Silicon: no behavior change (llama-server.replicas: 0, no AUDIO_STT_MODEL override).
  • Windows AMD: no behavior change (llama-server.replicas: 0, no override).

…ry limit env-driven

Two related compose-overlay drift bugs.

CPU overlay: docker-compose.cpu.yml hardcoded AUDIO_STT_MODEL=turbo,
overriding the env-driven base default
(${AUDIO_STT_MODEL:-Systran/faster-whisper-base}) that PR Light-Heart-Labs#985
established. On CPU-only Linux installs Phase 06 writes the base
model to .env and Phase 12 pre-downloads it, but open-webui ended up
requesting turbo (never cached) -> STT 404. Drop the now-empty
open-webui block in cpu.yml so the container inherits base's
interpolation. NVIDIA's identical literal is preserved per Light-Heart-Labs#985's
intentional safety-net design.

AMD overlay: docker-compose.amd.yml hardcoded memory: 110G on
llama-server while every other GPU overlay reads
${LLAMA_SERVER_MEMORY_LIMIT:-N}. AMD users tuning the documented env
var found it silently ignored. Replace with the env-driven pattern
preserving 110G as the AMD-specific fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yasinBursali yasinBursali force-pushed the fix/cpu-amd-overlay-env-driven-limits branch from 77651c2 to 9bf34b2 Compare April 30, 2026 23:28
@yasinBursali yasinBursali marked this pull request as ready for review May 1, 2026 22:44
Copy link
Copy Markdown
Collaborator

@Lightheartdevs Lightheartdevs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two correct surgical fixes: (1) drop CPU overlay's NVIDIA-specific AUDIO_STT_MODEL literal — wrong overlay; (2) make AMD memory limit env-driven via ${LLAMA_SERVER_MEMORY_LIMIT:-110G}. Both compose configs validate. Ship after rebase.

Copy link
Copy Markdown
Collaborator

@Lightheartdevs Lightheartdevs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-audited for merge. CPU overlay now inherits the base AUDIO_STT_MODEL, and AMD memory remains env-driven with the 110G fallback. Local CPU and AMD compose configs passed with required secret placeholders. Approving for squash merge.

@Lightheartdevs Lightheartdevs merged commit f1d74de into Light-Heart-Labs:main May 2, 2026
33 of 34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants