feat(compose): add docker-compose.jetson.yml + resolver branch#1482
feat(compose): add docker-compose.jetson.yml + resolver branch#1482matedev01 wants to merge 7 commits into
Conversation
Phase 2 of issue Light-Heart-Labs#195 milestone 1. Stacks on feat/jetson-detection. Adds the JETSON_ORIN_NANO tier for both qwen and gemma4 profiles in installers/lib/tier-map.sh, with conservative model selection sized for the Orin Nano 8GB unified-memory budget: qwen → qwen3.5-2b (~1.5 GB, 8K context) gemma4 → gemma-4-e2b-it (~2.81 GB, 8K context) Both set N_GPU_LAYERS=99 since the Tegra iGPU shares system RAM with the CPU — there is no benefit to partial offload on unified memory. Also adds config/backends/jetson.json mirroring nvidia.json (same llama-server contract on port 8080); the runtime difference lives in docker-compose.jetson.yml which is a follow-up PR. Tier validation error lists and tier_to_model() switches updated for both qwen and gemma4 paths so `dream model swap` resolves correctly. Tests: tier-map suite goes from 122 → 135 PASS (13 new Jetson assertions covering both profiles, plus the GGUF_URL coverage loop extension). Out of scope (separate follow-ups): - docker-compose.jetson.yml + resolver branch - Auto-tier selection on Jetson hosts (--tier required for now) - Orin AGX/NX, Xavier, legacy Nano - docs/JETSON-QUICKSTART.md, SUPPORT-MATRIX entry
Phase 3 of issue Light-Heart-Labs#195 milestone 1. Stacks on feat/jetson-tier-map. Adds the runtime path for NVIDIA Jetson: * docker-compose.jetson.yml — new overlay derived from the nvidia one with three Jetson-specific changes: - Default image dustynv/llama_cpp:r36.4.0 (Jetson-tuned llama.cpp, targets sm_87 for Orin Nano); the stock ggml-org image is built for sm_75/80/86/89/90 and won't load on Jetson. Override via LLAMA_SERVER_IMAGE if a different JetPack release is needed. - runtime: nvidia (Tegra container runtime), not the discrete-GPU deploy.resources.reservations.devices pattern, which is unreliable on L4T. - Memory limit defaults sized for 8 GB unified memory (Orin Nano); overridable via LLAMA_SERVER_MEMORY_LIMIT. * scripts/resolve-compose-stack.sh — new branch for gpu_backend == "jetson" or tier == "JETSON_ORIN_NANO", placed before the intel/sycl branch so it wins over the nvidia fallthrough. * scripts/build-capability-profile.sh — Jetson hosts now resolve to [base.yml, jetson.yml] instead of the Phase 2 [base.yml, cpu.yml] placeholder. * scripts/validate-manifest-schema.sh — gpu_backends enum extended to include "jetson" (and "none", which was already used by other scripts but missing from the validator). * Core service manifests (llama-server, dashboard, dashboard-api, open-webui) declare jetson in their gpu_backends so the resolver doesn't drop them on Jetson hosts. ComfyUI deliberately stays as [amd, nvidia] — no validated arm64+sm_87 path for image gen yet. * open-webui sets ENABLE_IMAGE_GENERATION=false by default on Jetson since ComfyUI is unavailable. * tests/test-jetson-compose-resolver.sh — new fixture-based test covering: jetson backend selects jetson overlay, JETSON_ORIN_NANO tier alone also selects it, ComfyUI is excluded, and nvidia/amd/cpu backends are unchanged. 11/11 PASS. Existing test regressions: tier-map 135/135, jetson-detection 12/12, resolve-compose-resilient 31/31 — all unchanged. Out of scope (separate follow-ups): - On-hardware install validation (Phase 4 of Light-Heart-Labs#195 milestone 1) - Orin AGX/NX, Xavier, legacy Nano - ComfyUI / Whisper GPU acceleration on Jetson - docs/JETSON-QUICKSTART.md + SUPPORT-MATRIX entry (Phase 5)
preflight-engine.sh has hard-coded tier→requirement maps for min_disk_gb
and min_ram_gb. Tiers missing from these maps fall back to the generic
50 GB / 16 GB defaults — which are sized for typical NVIDIA tier-2
installs and would have wrongly blocked any Jetson Orin Nano install
with a "Disk 42GB is below required minimum for tier JETSON_ORIN_NANO
(50GB)" error, even though the actual model footprint is ~17 GB.
Adds JETSON_ORIN_NANO entries:
min_disk_map: 15 GB
Qwen3.5-2B (~1.5 GB) + dustynv/llama_cpp image (~5 GB) + dashboard
stack + working space. Same envelope as tier 0 since model size
dominates.
min_ram_map: 6 GB
Orin Nano ships with 8 GB unified memory; usable ~7.6 GB after
kernel reservation. Requiring 16 GB would warn against the only
memory configuration this SKU has.
tier_rank_map: 0
Aligns with tier 0 for ordering purposes.
gpu-backend check: explicit jetson branch with an experimental
warning (referencing Light-Heart-Labs#195) so the preflight doesn't fall through
to the generic "Unknown backend" warn.
Result for the Jetson Orin Nano scenario (7 GB RAM, 42 GB disk):
blockers: 1 → 0
can_proceed: false → true
disk check: pass with "42GB meets tier JETSON_ORIN_NANO recommendation
(15GB)"
Regression: existing jetson detection (12/12), tier-map (135/135),
compose-resolver (11/11) tests unchanged.
On a Jetson Orin Nano install, the capability-profile pipeline can overwrite GPU_BACKEND from "jetson" to "cpu" when the hardware classifier lacks a Jetson entry (gpu-database.json does not yet know about Tegra SoCs, so classify-hardware.sh returns class_id=unknown and llm_backend falls back to cpu via build-capability-profile.sh). When that override fired in 02-detection.sh, GPU_BACKEND="cpu" met the resolver's `gpu_backend == "cpu"` branch FIRST in resolve-compose-stack.sh, short-circuiting before the `gpu_backend == "jetson" or tier == "JETSON_ORIN_NANO"` branch could catch it via tier. Result: the installer picked docker-compose.cpu.yml and started downloading the CPU llama-server image instead of the dustynv Jetson image, even with --tier JETSON_ORIN_NANO explicitly requested. Fix: move the jetson branch above the cpu branch. Tier alone is now the authoritative signal — even if the capability profile pipeline mishandles the backend, an explicit --tier JETSON_ORIN_NANO still selects docker-compose.jetson.yml. Regression test added: tier=JETSON_ORIN_NANO + gpu-backend=cpu must still resolve to docker-compose.jetson.yml. Compose resolver test suite now 12/12 PASS. Note: this fixes the SYMPTOM (wrong compose overlay selected). The ROOT CAUSE — capability profile reporting gpu=unknown / backend=cpu on real Jetson hardware despite the new detect-hardware.sh branch — is still under investigation. Tier-based fallback is the safety net.
… backends
build-capability-profile.sh runs in two passes:
1. A gpu_type → (llm_backend, overlays) branch picks the right values
for known vendors (amd, nvidia, apple, jetson).
2. An override block applies hw_rec_backend / hw_rec_overlays from
classify-hardware.sh's output.
Pass 2 was overriding pass 1 unconditionally. classify-hardware.sh is a
data-driven lookup against gpu-database.json which does not yet have
Jetson entries — it returns the default `cpu` backend for any
unrecognized vendor. On a real Orin Nano this produced:
pass 1: gpu_type=jetson → llm_backend=jetson, overlays=[base, jetson]
pass 2: hw_rec_backend=cpu → CLOBBERED to llm_backend=cpu, overlays=[base, cpu]
Result during install on Jetson hardware: capability profile shipped
backend=cpu + overlays=[base, cpu], the installer copied that into
GPU_BACKEND, and the resolver picked docker-compose.cpu.yml even with
--tier JETSON_ORIN_NANO. Confirmed in p3-1.log line 50:
"Capabilities override detection: backend=cpu, memory=unified, tier=T1"
Fix: gate the override block on gpu_type. If gpu_type is one of the
explicitly-handled vendors (amd / nvidia / apple / jetson), the pass-1
assignment is authoritative and the classifier's "I don't know → cpu"
default is ignored. Override still applies when gpu_type isn't known
to us (the safety net case it was designed for).
Also fixes two secondary issues in the same block:
- vendor whitelist on line 147 lacked "jetson", so the cap profile
JSON stored gpu.vendor="unknown" even when type was correctly
detected as jetson
- tier whitelist on line 126 lacked JETSON_ORIN_NANO, so explicit
tier values fell back to T1
Verified via simulated Python invocation: with gpu_type=jetson and
hw_rec_backend=cpu (the classify-hardware.sh fallback), the resulting
profile now correctly contains llm_backend=jetson and
overlays=[base.yml, jetson.yml].
Regression: detection 12/12, tier-map 135/135, compose-resolver 12/12,
resolve-compose-resilient 31/31 all pass.
|
Thanks for keeping this draft until the end-to-end Jetson install evidence lands. I checked out
I also saw no GitHub checks reported for |
…DA image
installers/phases/08-images.sh built the image-pull list using a
two-tier conditional:
if GPU_BACKEND == amd → lemonade image
elif GPU_BACKEND == cpu → ghcr.io/ggml-org/llama.cpp:server-b8248
else → ghcr.io/ggml-org/llama.cpp:server-cuda-b9014
The `else` branch caught GPU_BACKEND=jetson and pulled the discrete-CUDA
ggml image, which is compiled for sm_75/80/86/89/90 — NOT sm_87 (Orin
Nano Ampere). On a real Jetson install the image pulled cleanly but
would fail to load CUDA kernels at runtime, hanging container init and
stalling the entire compose stack.
This contradicted the docker-compose.jetson.yml default
(dustynv/llama_cpp:r36.4.0) because the pre-pull step sets
LLAMA_SERVER_IMAGE in .env, and that env value then shadows the compose
file's `${LLAMA_SERVER_IMAGE:-dustynv/...}` default — the overlay
default never got a chance to fire.
Fix: add an explicit jetson branch that pulls
dustynv/llama_cpp:r36.4.0 (the Jetson-tuned community image with
sm_87 in its CUDA arch list, matching JetPack 6.x). Aligns the pre-pull
image with what docker-compose.jetson.yml expects.
Also extends the LLAMA_SERVER_IMAGE_FALLBACK validation gate (line ~63)
to include jetson, so an invalid pin gets caught by the same path as
nvidia/cpu/intel/sycl instead of silently passing through.
Verified on real Orin Nano (p3-2.log): the prior install pulled
ggml-org/llama.cpp:server-cuda-b9014 (visible in `docker image prune`
deletion list); the container would hang at init and never become
healthy. With this patch, the pre-pull step requests dustynv directly,
matching the compose overlay's runtime image.
Regression: jetson-detection 12/12, tier-map 135/135,
compose-resolver 12/12, all unchanged.
Draft — opening for visibility while Phase 4 on-hardware install evidence is captured. Marking ready-for-review only after a Jetson Orin Nano install completes end-to-end against this overlay.
Summary
Phase 3 of issue #195 milestone 1. Stacks on #1481 (Phase 2 tier-map). Adds the runtime path for NVIDIA Jetson: a new compose overlay, a resolver branch, and the manifest-schema + core-extension updates so the resolver doesn't drop core services on Jetson hosts.
What lands here
docker-compose.jetson.yml(new)dustynv/llama_cpp:r36.4.0(targets sm_87 for Orin Nano — stockghcr.io/ggml-org/llama.cpp:server-cuda-*images won't load).runtime: nvidia(Tegra container runtime), notdeploy.resources.reservations.devices. Memory limit defaults sized for 8 GB unified. Override hooks:LLAMA_SERVER_IMAGE,JETSON_RUNTIME,LLAMA_SERVER_MEMORY_LIMITscripts/resolve-compose-stack.shgpu_backend == "jetson"ortier == "JETSON_ORIN_NANO", placed before the intel/sycl branch so it wins over the nvidia fallthroughscripts/build-capability-profile.sh[base.yml, jetson.yml]instead of the Phase 2[base.yml, cpu.yml]placeholderscripts/validate-manifest-schema.shgpu_backendsenum extended to includejetson(also addsnone, which was used byresolve-compose-stack.shandaudit-extensions.pybut missing from the validator)llama-server,dashboard,dashboard-api,open-webuideclarejetsoningpu_backendsso the resolver doesn't exclude them. ComfyUI deliberately stays[amd, nvidia]— no validated arm64+sm_87 pathopen-webuienvENABLE_IMAGE_GENERATION=falseby default on Jetson since ComfyUI is unavailabletests/test-jetson-compose-resolver.sh(new)JETSON_ORIN_NANOtier alone selects it; ComfyUI is excluded; nvidia/amd/cpu paths are regression-freeWhy these specific runtime choices
ghcr.io/ggml-org/llama.cpp:server-cuda-*doesn't include compute capability 8.7. Jetson Orin Nano needs an image built withCMAKE_CUDA_ARCHITECTURES=87on a JetPack base.dustynv/llama_cpp:r36.4.0is the community-maintained option matching JetPack 6.x. If the maintainer prefers a vendoredDockerfilebuilding fromnvcr.io/nvidia/l4t-jetpack:r36.4.0, happy to swap — that variant is more reproducible but adds ~20 min compile time on Orin Nano.runtime: nvidia: the Tegra container runtime is configured by the JetPack installer in/etc/docker/daemon.jsonand is the only reliable GPU passthrough mechanism on L4T.deploy.resources.reservations.devices(the discrete-GPU pattern) silently fails on Jetson in some Docker daemon configurations.Test plan
Static checks
bash -nclean on all changed filespython3 -c "import yaml; yaml.safe_load(...)"validates the new compose YAML and all four edited manifestsbash scripts/validate-manifest-schema.sh— 22/24 valid; 2 pre-existing errors onopencode(typehost-systemd) andtailscale(missingservice.health) untouched by this PRTest suites
Resolver sanity
What's deliberately NOT included (separate follow-ups)
./install.sh --tier JETSON_ORIN_NANO --bootstraprun on Orin Nano 8GB completes with a valid (non-?-flood) inference response, with logs +tegrastats+ Open WebUI screenshot attached as a commentdocs/JETSON-QUICKSTART.md+SUPPORT-MATRIX.mdentry — Phase 5, after Phase 4 landsStack note
Depends on #1479 (Phase 1) and #1481 (Phase 2). Like #1481, this targets
mainbecause GitHub can't accept a base branch that doesn't exist in upstream — diff shows +194/-12 total but only +194/-12 minus the prior phases is the new code in this PR.