Skip to content

fix(embeddings): raise healthcheck start_period from 120s to 600s#1028

Merged
Lightheartdevs merged 1 commit intoLight-Heart-Labs:mainfrom
yasinBursali:fix/embeddings-start-period
Apr 27, 2026
Merged

fix(embeddings): raise healthcheck start_period from 120s to 600s#1028
Lightheartdevs merged 1 commit intoLight-Heart-Labs:mainfrom
yasinBursali:fix/embeddings-start-period

Conversation

@yasinBursali
Copy link
Copy Markdown
Contributor

What

Raise start_period on the embeddings extension healthcheck from 120s to 600s.

Why

The Hugging Face TEI image downloads its model at first start. On slow connections a ~115 MB model (default BAAI/bge-base-en-v1.5) can exceed the existing 120s + 5x30s = 270s total grace window, making the container flip unhealthy while the download is still progressing, followed by restart-loop confusion on the dashboard.

How

One-line edit to dream-server/extensions/services/embeddings/compose.yaml:28.

start_period is a grace window — Docker ignores failed checks inside it and flips to healthy on the first successful probe — so raising it does not slow down warm starts and does not mask true crash-loops (restart: unless-stopped still triggers on non-zero exit).

Testing

  • docker compose config parses the modified file cleanly.
  • scripts/validate-compose-stack.sh passes (2 services, exit 0).
  • python3 -c "import yaml; yaml.safe_load(...)" — YAML valid.
  • pre-commit (gitleaks, private-key, large-file) passes.
  • No env vars added, .env.schema.json untouched.

Review

Critique Guardian APPROVED (no required changes).

Platform Impact

  • macOS: identical behavior (Docker Desktop); healthcheck semantics are engine-level, platform-neutral.
  • Linux: identical behavior.
  • Windows (WSL2): identical behavior.

The TEI image downloads the default embedding model
(BAAI/bge-base-en-v1.5, ~115 MB) from HuggingFace Hub at first
start. On slow residential connections (<=5 Mbps) the download
exceeds the existing 120s grace window plus 5x30s retry budget,
causing spurious "unhealthy" flips and restart loops during an
otherwise progressing first install.

600s accommodates slow-connection first-install downloads while
leaving the post-warmup 30s x 5 detection window unchanged. Warm
starts (model cached on a named volume) flip to healthy on the
first probe as before, so fast paths are unaffected.
@Lightheartdevs Lightheartdevs merged commit f77e2f7 into Light-Heart-Labs:main Apr 27, 2026
26 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants