feat(compose): add docker-compose.jetson.yml + resolver branch by matedev01 · Pull Request #1482 · Light-Heart-Labs/DreamServer

matedev01 · 2026-05-27T03:08:07Z

Draft — opening for visibility while Phase 4 on-hardware install evidence is captured. Marking ready-for-review only after a Jetson Orin Nano install completes end-to-end against this overlay.

Summary

Phase 3 of issue #195 milestone 1. Stacks on #1481 (Phase 2 tier-map). Adds the runtime path for NVIDIA Jetson: a new compose overlay, a resolver branch, and the manifest-schema + core-extension updates so the resolver doesn't drop core services on Jetson hosts.

What lands here

Layer	Change
`docker-compose.jetson.yml` (new)	Jetson-tuned overlay. Default image `dustynv/llama_cpp:r36.4.0` (targets sm_87 for Orin Nano — stock `ghcr.io/ggml-org/llama.cpp:server-cuda-*` images won't load). `runtime: nvidia` (Tegra container runtime), not `deploy.resources.reservations.devices`. Memory limit defaults sized for 8 GB unified. Override hooks: `LLAMA_SERVER_IMAGE`, `JETSON_RUNTIME`, `LLAMA_SERVER_MEMORY_LIMIT`
`scripts/resolve-compose-stack.sh`	New branch for `gpu_backend == "jetson"` or `tier == "JETSON_ORIN_NANO"`, placed before the intel/sycl branch so it wins over the nvidia fallthrough
`scripts/build-capability-profile.sh`	Jetson hosts now resolve to `[base.yml, jetson.yml]` instead of the Phase 2 `[base.yml, cpu.yml]` placeholder
`scripts/validate-manifest-schema.sh`	`gpu_backends` enum extended to include `jetson` (also adds `none`, which was used by `resolve-compose-stack.sh` and `audit-extensions.py` but missing from the validator)
Core manifests	`llama-server`, `dashboard`, `dashboard-api`, `open-webui` declare `jetson` in `gpu_backends` so the resolver doesn't exclude them. ComfyUI deliberately stays `[amd, nvidia]` — no validated arm64+sm_87 path
`open-webui` env	`ENABLE_IMAGE_GENERATION=false` by default on Jetson since ComfyUI is unavailable
`tests/test-jetson-compose-resolver.sh` (new)	11 assertions: jetson backend selects jetson overlay; `JETSON_ORIN_NANO` tier alone selects it; ComfyUI is excluded; nvidia/amd/cpu paths are regression-free

Why these specific runtime choices

Image: ghcr.io/ggml-org/llama.cpp:server-cuda-* doesn't include compute capability 8.7. Jetson Orin Nano needs an image built with CMAKE_CUDA_ARCHITECTURES=87 on a JetPack base. dustynv/llama_cpp:r36.4.0 is the community-maintained option matching JetPack 6.x. If the maintainer prefers a vendored Dockerfile building from nvcr.io/nvidia/l4t-jetpack:r36.4.0, happy to swap — that variant is more reproducible but adds ~20 min compile time on Orin Nano.
runtime: nvidia: the Tegra container runtime is configured by the JetPack installer in /etc/docker/daemon.json and is the only reliable GPU passthrough mechanism on L4T. deploy.resources.reservations.devices (the discrete-GPU pattern) silently fails on Jetson in some Docker daemon configurations.
Memory limit 6 GB: Orin Nano has ~7.6 GB usable unified RAM. 6 GB cap for llama-server leaves ~1.5 GB for OS + open-webui + dashboard-api + LiteLLM. Validated by the model footprint chosen in feat(tier-map): add JETSON_ORIN_NANO tier and jetson backend #1481 (Qwen3.5-2B ~1.5 GB / Gemma E2B ~2.81 GB).

Test plan

Static checks

bash -n clean on all changed files
python3 -c "import yaml; yaml.safe_load(...)" validates the new compose YAML and all four edited manifests
bash scripts/validate-manifest-schema.sh — 22/24 valid; 2 pre-existing errors on opencode (type host-systemd) and tailscale (missing service.health) untouched by this PR

Test suites

bash tests/test-jetson-compose-resolver.sh   # NEW: 11/11 PASS
bash tests/test-jetson-detection.sh          # 12/12 PASS (Phase 1, unchanged)
bash tests/test-tier-map.sh                  # 135/135 PASS (Phase 2, unchanged)
bash tests/test-resolve-compose-resilient.sh # 31/31 PASS (existing, unchanged)

Resolver sanity

bash scripts/resolve-compose-stack.sh --gpu-backend jetson --env | grep COMPOSE_PRIMARY_FILE
# COMPOSE_PRIMARY_FILE="docker-compose.jetson.yml"

bash scripts/resolve-compose-stack.sh --tier JETSON_ORIN_NANO --env | grep COMPOSE_PRIMARY_FILE
# COMPOSE_PRIMARY_FILE="docker-compose.jetson.yml"

bash scripts/resolve-compose-stack.sh --gpu-backend nvidia --env | grep COMPOSE_PRIMARY_FILE
# COMPOSE_PRIMARY_FILE="docker-compose.nvidia.yml"   ← regression-free

What's deliberately NOT included (separate follow-ups)

On-hardware install validation — Phase 4. PR moves to ready-for-review after a full ./install.sh --tier JETSON_ORIN_NANO --bootstrap run on Orin Nano 8GB completes with a valid (non-?-flood) inference response, with logs + tegrastats + Open WebUI screenshot attached as a comment
Orin AGX/NX, Xavier, legacy Nano — different SoCs / CUDA caps
ComfyUI / Whisper GPU acceleration on Jetson — no validated path
docs/JETSON-QUICKSTART.md + SUPPORT-MATRIX.md entry — Phase 5, after Phase 4 lands

Stack note

Depends on #1479 (Phase 1) and #1481 (Phase 2). Like #1481, this targets main because GitHub can't accept a base branch that doesn't exist in upstream — diff shows +194/-12 total but only +194/-12 minus the prior phases is the new code in this PR.

Phase 2 of issue Light-Heart-Labs#195 milestone 1. Stacks on feat/jetson-detection. Adds the JETSON_ORIN_NANO tier for both qwen and gemma4 profiles in installers/lib/tier-map.sh, with conservative model selection sized for the Orin Nano 8GB unified-memory budget: qwen → qwen3.5-2b (~1.5 GB, 8K context) gemma4 → gemma-4-e2b-it (~2.81 GB, 8K context) Both set N_GPU_LAYERS=99 since the Tegra iGPU shares system RAM with the CPU — there is no benefit to partial offload on unified memory. Also adds config/backends/jetson.json mirroring nvidia.json (same llama-server contract on port 8080); the runtime difference lives in docker-compose.jetson.yml which is a follow-up PR. Tier validation error lists and tier_to_model() switches updated for both qwen and gemma4 paths so `dream model swap` resolves correctly. Tests: tier-map suite goes from 122 → 135 PASS (13 new Jetson assertions covering both profiles, plus the GGUF_URL coverage loop extension). Out of scope (separate follow-ups): - docker-compose.jetson.yml + resolver branch - Auto-tier selection on Jetson hosts (--tier required for now) - Orin AGX/NX, Xavier, legacy Nano - docs/JETSON-QUICKSTART.md, SUPPORT-MATRIX entry

Phase 3 of issue Light-Heart-Labs#195 milestone 1. Stacks on feat/jetson-tier-map. Adds the runtime path for NVIDIA Jetson: * docker-compose.jetson.yml — new overlay derived from the nvidia one with three Jetson-specific changes: - Default image dustynv/llama_cpp:r36.4.0 (Jetson-tuned llama.cpp, targets sm_87 for Orin Nano); the stock ggml-org image is built for sm_75/80/86/89/90 and won't load on Jetson. Override via LLAMA_SERVER_IMAGE if a different JetPack release is needed. - runtime: nvidia (Tegra container runtime), not the discrete-GPU deploy.resources.reservations.devices pattern, which is unreliable on L4T. - Memory limit defaults sized for 8 GB unified memory (Orin Nano); overridable via LLAMA_SERVER_MEMORY_LIMIT. * scripts/resolve-compose-stack.sh — new branch for gpu_backend == "jetson" or tier == "JETSON_ORIN_NANO", placed before the intel/sycl branch so it wins over the nvidia fallthrough. * scripts/build-capability-profile.sh — Jetson hosts now resolve to [base.yml, jetson.yml] instead of the Phase 2 [base.yml, cpu.yml] placeholder. * scripts/validate-manifest-schema.sh — gpu_backends enum extended to include "jetson" (and "none", which was already used by other scripts but missing from the validator). * Core service manifests (llama-server, dashboard, dashboard-api, open-webui) declare jetson in their gpu_backends so the resolver doesn't drop them on Jetson hosts. ComfyUI deliberately stays as [amd, nvidia] — no validated arm64+sm_87 path for image gen yet. * open-webui sets ENABLE_IMAGE_GENERATION=false by default on Jetson since ComfyUI is unavailable. * tests/test-jetson-compose-resolver.sh — new fixture-based test covering: jetson backend selects jetson overlay, JETSON_ORIN_NANO tier alone also selects it, ComfyUI is excluded, and nvidia/amd/cpu backends are unchanged. 11/11 PASS. Existing test regressions: tier-map 135/135, jetson-detection 12/12, resolve-compose-resilient 31/31 — all unchanged. Out of scope (separate follow-ups): - On-hardware install validation (Phase 4 of Light-Heart-Labs#195 milestone 1) - Orin AGX/NX, Xavier, legacy Nano - ComfyUI / Whisper GPU acceleration on Jetson - docs/JETSON-QUICKSTART.md + SUPPORT-MATRIX entry (Phase 5)

preflight-engine.sh has hard-coded tier→requirement maps for min_disk_gb and min_ram_gb. Tiers missing from these maps fall back to the generic 50 GB / 16 GB defaults — which are sized for typical NVIDIA tier-2 installs and would have wrongly blocked any Jetson Orin Nano install with a "Disk 42GB is below required minimum for tier JETSON_ORIN_NANO (50GB)" error, even though the actual model footprint is ~17 GB. Adds JETSON_ORIN_NANO entries: min_disk_map: 15 GB Qwen3.5-2B (~1.5 GB) + dustynv/llama_cpp image (~5 GB) + dashboard stack + working space. Same envelope as tier 0 since model size dominates. min_ram_map: 6 GB Orin Nano ships with 8 GB unified memory; usable ~7.6 GB after kernel reservation. Requiring 16 GB would warn against the only memory configuration this SKU has. tier_rank_map: 0 Aligns with tier 0 for ordering purposes. gpu-backend check: explicit jetson branch with an experimental warning (referencing Light-Heart-Labs#195) so the preflight doesn't fall through to the generic "Unknown backend" warn. Result for the Jetson Orin Nano scenario (7 GB RAM, 42 GB disk): blockers: 1 → 0 can_proceed: false → true disk check: pass with "42GB meets tier JETSON_ORIN_NANO recommendation (15GB)" Regression: existing jetson detection (12/12), tier-map (135/135), compose-resolver (11/11) tests unchanged.

On a Jetson Orin Nano install, the capability-profile pipeline can overwrite GPU_BACKEND from "jetson" to "cpu" when the hardware classifier lacks a Jetson entry (gpu-database.json does not yet know about Tegra SoCs, so classify-hardware.sh returns class_id=unknown and llm_backend falls back to cpu via build-capability-profile.sh). When that override fired in 02-detection.sh, GPU_BACKEND="cpu" met the resolver's `gpu_backend == "cpu"` branch FIRST in resolve-compose-stack.sh, short-circuiting before the `gpu_backend == "jetson" or tier == "JETSON_ORIN_NANO"` branch could catch it via tier. Result: the installer picked docker-compose.cpu.yml and started downloading the CPU llama-server image instead of the dustynv Jetson image, even with --tier JETSON_ORIN_NANO explicitly requested. Fix: move the jetson branch above the cpu branch. Tier alone is now the authoritative signal — even if the capability profile pipeline mishandles the backend, an explicit --tier JETSON_ORIN_NANO still selects docker-compose.jetson.yml. Regression test added: tier=JETSON_ORIN_NANO + gpu-backend=cpu must still resolve to docker-compose.jetson.yml. Compose resolver test suite now 12/12 PASS. Note: this fixes the SYMPTOM (wrong compose overlay selected). The ROOT CAUSE — capability profile reporting gpu=unknown / backend=cpu on real Jetson hardware despite the new detect-hardware.sh branch — is still under investigation. Tier-based fallback is the safety net.

… backends build-capability-profile.sh runs in two passes: 1. A gpu_type → (llm_backend, overlays) branch picks the right values for known vendors (amd, nvidia, apple, jetson). 2. An override block applies hw_rec_backend / hw_rec_overlays from classify-hardware.sh's output. Pass 2 was overriding pass 1 unconditionally. classify-hardware.sh is a data-driven lookup against gpu-database.json which does not yet have Jetson entries — it returns the default `cpu` backend for any unrecognized vendor. On a real Orin Nano this produced: pass 1: gpu_type=jetson → llm_backend=jetson, overlays=[base, jetson] pass 2: hw_rec_backend=cpu → CLOBBERED to llm_backend=cpu, overlays=[base, cpu] Result during install on Jetson hardware: capability profile shipped backend=cpu + overlays=[base, cpu], the installer copied that into GPU_BACKEND, and the resolver picked docker-compose.cpu.yml even with --tier JETSON_ORIN_NANO. Confirmed in p3-1.log line 50: "Capabilities override detection: backend=cpu, memory=unified, tier=T1" Fix: gate the override block on gpu_type. If gpu_type is one of the explicitly-handled vendors (amd / nvidia / apple / jetson), the pass-1 assignment is authoritative and the classifier's "I don't know → cpu" default is ignored. Override still applies when gpu_type isn't known to us (the safety net case it was designed for). Also fixes two secondary issues in the same block: - vendor whitelist on line 147 lacked "jetson", so the cap profile JSON stored gpu.vendor="unknown" even when type was correctly detected as jetson - tier whitelist on line 126 lacked JETSON_ORIN_NANO, so explicit tier values fell back to T1 Verified via simulated Python invocation: with gpu_type=jetson and hw_rec_backend=cpu (the classify-hardware.sh fallback), the resulting profile now correctly contains llm_backend=jetson and overlays=[base.yml, jetson.yml]. Regression: detection 12/12, tier-map 135/135, compose-resolver 12/12, resolve-compose-resilient 31/31 all pass.

Lightheartdevs · 2026-05-27T18:50:18Z

Thanks for keeping this draft until the end-to-end Jetson install evidence lands. I checked out feat/jetson-compose-overlay locally and ran two lightweight audits; both currently fail:

python dream-server/tests/test-dependency-pins.py

docker-compose.jetson.yml:24: variable image ref is not documented: ${LLAMA_SERVER_IMAGE:-dustynv/llama_cpp:r36.4.0}
docker-compose.jetson.yml:24: image ref is not recorded in dependency-lock.json: dustynv/llama_cpp:r36.4.0

Please add the Jetson image/default to config/dependency-lock.json, or adjust the pin policy if this image should be handled differently.

python dream-server/scripts/audit-extensions.py --project-dir dream-server

This fails because the core manifests now declare jetson in gpu_backends, but scripts/audit-extensions.py still has VALID_GPU_BACKENDS = {"amd", "nvidia", "apple", "all", "none"}. Adding jetson there should make the extension audit accept these manifest changes.

I also saw no GitHub checks reported for feat/jetson-compose-overlay. These look like straightforward draft blockers before ready-for-review, in addition to the Phase 4 on-hardware install proof already called out in the PR body.

…DA image installers/phases/08-images.sh built the image-pull list using a two-tier conditional: if GPU_BACKEND == amd → lemonade image elif GPU_BACKEND == cpu → ghcr.io/ggml-org/llama.cpp:server-b8248 else → ghcr.io/ggml-org/llama.cpp:server-cuda-b9014 The `else` branch caught GPU_BACKEND=jetson and pulled the discrete-CUDA ggml image, which is compiled for sm_75/80/86/89/90 — NOT sm_87 (Orin Nano Ampere). On a real Jetson install the image pulled cleanly but would fail to load CUDA kernels at runtime, hanging container init and stalling the entire compose stack. This contradicted the docker-compose.jetson.yml default (dustynv/llama_cpp:r36.4.0) because the pre-pull step sets LLAMA_SERVER_IMAGE in .env, and that env value then shadows the compose file's `${LLAMA_SERVER_IMAGE:-dustynv/...}` default — the overlay default never got a chance to fire. Fix: add an explicit jetson branch that pulls dustynv/llama_cpp:r36.4.0 (the Jetson-tuned community image with sm_87 in its CUDA arch list, matching JetPack 6.x). Aligns the pre-pull image with what docker-compose.jetson.yml expects. Also extends the LLAMA_SERVER_IMAGE_FALLBACK validation gate (line ~63) to include jetson, so an invalid pin gets caught by the same path as nvidia/cpu/intel/sycl instead of silently passing through. Verified on real Orin Nano (p3-2.log): the prior install pulled ggml-org/llama.cpp:server-cuda-b9014 (visible in `docker image prune` deletion list); the container would hang at init and never become healthy. With this patch, the pre-pull step requests dustynv directly, matching the compose overlay's runtime image. Regression: jetson-detection 12/12, tier-map 135/135, compose-resolver 12/12, all unchanged.

matedev01 added 6 commits May 27, 2026 02:52

feat: detect aarch64 orin nano

3321bd6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(compose): add docker-compose.jetson.yml + resolver branch#1482

feat(compose): add docker-compose.jetson.yml + resolver branch#1482
matedev01 wants to merge 7 commits into
Light-Heart-Labs:mainfrom
matedev01:feat/jetson-compose-overlay

matedev01 commented May 27, 2026

Uh oh!

Lightheartdevs commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

matedev01 commented May 27, 2026

Summary

What lands here

Why these specific runtime choices

Test plan

Static checks

Test suites

Resolver sanity

What's deliberately NOT included (separate follow-ups)

Stack note

Uh oh!

Lightheartdevs commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants