fix(compose): drop CPU overlay AUDIO_STT_MODEL literal; make AMD memory limit env-driven#1067
Merged
Lightheartdevs merged 1 commit intoLight-Heart-Labs:mainfrom May 2, 2026
Conversation
…ry limit env-driven
Two related compose-overlay drift bugs.
CPU overlay: docker-compose.cpu.yml hardcoded AUDIO_STT_MODEL=turbo,
overriding the env-driven base default
(${AUDIO_STT_MODEL:-Systran/faster-whisper-base}) that PR Light-Heart-Labs#985
established. On CPU-only Linux installs Phase 06 writes the base
model to .env and Phase 12 pre-downloads it, but open-webui ended up
requesting turbo (never cached) -> STT 404. Drop the now-empty
open-webui block in cpu.yml so the container inherits base's
interpolation. NVIDIA's identical literal is preserved per Light-Heart-Labs#985's
intentional safety-net design.
AMD overlay: docker-compose.amd.yml hardcoded memory: 110G on
llama-server while every other GPU overlay reads
${LLAMA_SERVER_MEMORY_LIMIT:-N}. AMD users tuning the documented env
var found it silently ignored. Replace with the env-driven pattern
preserving 110G as the AMD-specific fallback.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
77651c2 to
9bf34b2
Compare
Lightheartdevs
approved these changes
May 2, 2026
Collaborator
Lightheartdevs
left a comment
There was a problem hiding this comment.
Two correct surgical fixes: (1) drop CPU overlay's NVIDIA-specific AUDIO_STT_MODEL literal — wrong overlay; (2) make AMD memory limit env-driven via ${LLAMA_SERVER_MEMORY_LIMIT:-110G}. Both compose configs validate. Ship after rebase.
Lightheartdevs
approved these changes
May 2, 2026
Collaborator
Lightheartdevs
left a comment
There was a problem hiding this comment.
Re-audited for merge. CPU overlay now inherits the base AUDIO_STT_MODEL, and AMD memory remains env-driven with the 110G fallback. Local CPU and AMD compose configs passed with required secret placeholders. Approving for squash merge.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Two related compose-overlay drift fixes:
dream-server/docker-compose.cpu.yml: drop the now-emptyopen-webui:block (was overridingAUDIO_STT_MODELto turbo)dream-server/docker-compose.amd.yml: replace literalmemory: 110Gwith${LLAMA_SERVER_MEMORY_LIMIT:-110G}Why
CPU overlay: PR #985 made
AUDIO_STT_MODELenv-driven (${AUDIO_STT_MODEL:-Systran/faster-whisper-base}indocker-compose.base.yml). The cpu overlay's residual literalAUDIO_STT_MODEL: "deepdml/faster-whisper-large-v3-turbo-ct2"overrode that interpolation, so on CPU-only Linux installs:AUDIO_STT_MODEL=Systran/faster-whisper-baseto.envThe NVIDIA overlay's identical literal is intentional — for NVIDIA, Phase 06 also picks turbo, so the literal is a redundant safety net (per #985's design). Only the cpu overlay drifted.
AMD overlay: 5 of 6 GPU overlays use
${LLAMA_SERVER_MEMORY_LIMIT:-N}(NVIDIA 64G, Apple 32G, Intel/Arc 24G, CPU 6G)..env.schema.jsonregisters the variable as a tunable string. The amd overlay's literalmemory: 110Gmade AMD the only platform where users couldn't tune llama-server's memory cap.tier0.yml's literalmemory: 4Gis a deliberate hard cap for <8 GB RAM hosts and was left alone.How
open-webui:block (3 + leading-blank lines). Container inherits the base's env-var interpolation.Testing
make lint,docker compose config --quiet, pre-commit: all PASSSystran/faster-whisper-baseLLAMA_SERVER_MEMORY_LIMIT=32G→ resolves 32GManual:
docker exec dream-open-webui env | grep AUDIO_STT_MODELshould equalSystran/faster-whisper-basepost-installLLAMA_SERVER_MEMORY_LIMIT=32Gin.env,dream restart, thendocker inspect dream-llama-server --format '{{.HostConfig.Memory}}'should equal34359738368(32 GiB) — was118111600640(110 GiB) beforeReview
Independent verification confirmed both bugs are real and isolated to these two overlays; cross-overlay scan found no other drift on AUDIO_STT_MODEL or LLAMA_SERVER_MEMORY_LIMIT.
Known Considerations
LLAMA_SERVER_MEMORY_LIMIT="64G"and AMD's compose fallback:-110Gdiffer. This matches the codebase's existing pattern: schema default applies to installer env-generation; per-backend overlay fallback applies at compose-time when.envis silent. Same divergence already exists for nvidia/apple/intel/arc/cpu — not a regression.LLAMA_SERVER_MEMORY_LIMITto.envper-backend so the schema default becomes load-bearing. That's a design decision, not a bug.Platform Impact
LLAMA_SERVER_MEMORY_LIMITenv var now honored (was silently ignored).llama-server.replicas: 0, no AUDIO_STT_MODEL override).llama-server.replicas: 0, no override).