Skip to content

feat(compose): add docker-compose.jetson.yml + resolver branch#1482

Draft
matedev01 wants to merge 7 commits into
Light-Heart-Labs:mainfrom
matedev01:feat/jetson-compose-overlay
Draft

feat(compose): add docker-compose.jetson.yml + resolver branch#1482
matedev01 wants to merge 7 commits into
Light-Heart-Labs:mainfrom
matedev01:feat/jetson-compose-overlay

Conversation

@matedev01

Copy link
Copy Markdown
Contributor

Draft — opening for visibility while Phase 4 on-hardware install evidence is captured. Marking ready-for-review only after a Jetson Orin Nano install completes end-to-end against this overlay.

Summary

Phase 3 of issue #195 milestone 1. Stacks on #1481 (Phase 2 tier-map). Adds the runtime path for NVIDIA Jetson: a new compose overlay, a resolver branch, and the manifest-schema + core-extension updates so the resolver doesn't drop core services on Jetson hosts.

What lands here

Layer Change
docker-compose.jetson.yml (new) Jetson-tuned overlay. Default image dustynv/llama_cpp:r36.4.0 (targets sm_87 for Orin Nano — stock ghcr.io/ggml-org/llama.cpp:server-cuda-* images won't load). runtime: nvidia (Tegra container runtime), not deploy.resources.reservations.devices. Memory limit defaults sized for 8 GB unified. Override hooks: LLAMA_SERVER_IMAGE, JETSON_RUNTIME, LLAMA_SERVER_MEMORY_LIMIT
scripts/resolve-compose-stack.sh New branch for gpu_backend == "jetson" or tier == "JETSON_ORIN_NANO", placed before the intel/sycl branch so it wins over the nvidia fallthrough
scripts/build-capability-profile.sh Jetson hosts now resolve to [base.yml, jetson.yml] instead of the Phase 2 [base.yml, cpu.yml] placeholder
scripts/validate-manifest-schema.sh gpu_backends enum extended to include jetson (also adds none, which was used by resolve-compose-stack.sh and audit-extensions.py but missing from the validator)
Core manifests llama-server, dashboard, dashboard-api, open-webui declare jetson in gpu_backends so the resolver doesn't exclude them. ComfyUI deliberately stays [amd, nvidia] — no validated arm64+sm_87 path
open-webui env ENABLE_IMAGE_GENERATION=false by default on Jetson since ComfyUI is unavailable
tests/test-jetson-compose-resolver.sh (new) 11 assertions: jetson backend selects jetson overlay; JETSON_ORIN_NANO tier alone selects it; ComfyUI is excluded; nvidia/amd/cpu paths are regression-free

Why these specific runtime choices

  • Image: ghcr.io/ggml-org/llama.cpp:server-cuda-* doesn't include compute capability 8.7. Jetson Orin Nano needs an image built with CMAKE_CUDA_ARCHITECTURES=87 on a JetPack base. dustynv/llama_cpp:r36.4.0 is the community-maintained option matching JetPack 6.x. If the maintainer prefers a vendored Dockerfile building from nvcr.io/nvidia/l4t-jetpack:r36.4.0, happy to swap — that variant is more reproducible but adds ~20 min compile time on Orin Nano.
  • runtime: nvidia: the Tegra container runtime is configured by the JetPack installer in /etc/docker/daemon.json and is the only reliable GPU passthrough mechanism on L4T. deploy.resources.reservations.devices (the discrete-GPU pattern) silently fails on Jetson in some Docker daemon configurations.
  • Memory limit 6 GB: Orin Nano has ~7.6 GB usable unified RAM. 6 GB cap for llama-server leaves ~1.5 GB for OS + open-webui + dashboard-api + LiteLLM. Validated by the model footprint chosen in feat(tier-map): add JETSON_ORIN_NANO tier and jetson backend #1481 (Qwen3.5-2B ~1.5 GB / Gemma E2B ~2.81 GB).

Test plan

Static checks

  • bash -n clean on all changed files
  • python3 -c "import yaml; yaml.safe_load(...)" validates the new compose YAML and all four edited manifests
  • bash scripts/validate-manifest-schema.sh — 22/24 valid; 2 pre-existing errors on opencode (type host-systemd) and tailscale (missing service.health) untouched by this PR

Test suites

bash tests/test-jetson-compose-resolver.sh   # NEW: 11/11 PASS
bash tests/test-jetson-detection.sh          # 12/12 PASS (Phase 1, unchanged)
bash tests/test-tier-map.sh                  # 135/135 PASS (Phase 2, unchanged)
bash tests/test-resolve-compose-resilient.sh # 31/31 PASS (existing, unchanged)

Resolver sanity

bash scripts/resolve-compose-stack.sh --gpu-backend jetson --env | grep COMPOSE_PRIMARY_FILE
# COMPOSE_PRIMARY_FILE="docker-compose.jetson.yml"

bash scripts/resolve-compose-stack.sh --tier JETSON_ORIN_NANO --env | grep COMPOSE_PRIMARY_FILE
# COMPOSE_PRIMARY_FILE="docker-compose.jetson.yml"

bash scripts/resolve-compose-stack.sh --gpu-backend nvidia --env | grep COMPOSE_PRIMARY_FILE
# COMPOSE_PRIMARY_FILE="docker-compose.nvidia.yml"   ← regression-free

What's deliberately NOT included (separate follow-ups)

  • On-hardware install validation — Phase 4. PR moves to ready-for-review after a full ./install.sh --tier JETSON_ORIN_NANO --bootstrap run on Orin Nano 8GB completes with a valid (non-?-flood) inference response, with logs + tegrastats + Open WebUI screenshot attached as a comment
  • Orin AGX/NX, Xavier, legacy Nano — different SoCs / CUDA caps
  • ComfyUI / Whisper GPU acceleration on Jetson — no validated path
  • docs/JETSON-QUICKSTART.md + SUPPORT-MATRIX.md entry — Phase 5, after Phase 4 lands

Stack note

Depends on #1479 (Phase 1) and #1481 (Phase 2). Like #1481, this targets main because GitHub can't accept a base branch that doesn't exist in upstream — diff shows +194/-12 total but only +194/-12 minus the prior phases is the new code in this PR.

matedev01 added 6 commits May 27, 2026 02:52
Phase 2 of issue Light-Heart-Labs#195 milestone 1. Stacks on feat/jetson-detection.

Adds the JETSON_ORIN_NANO tier for both qwen and gemma4 profiles in
installers/lib/tier-map.sh, with conservative model selection sized for
the Orin Nano 8GB unified-memory budget:

  qwen   → qwen3.5-2b      (~1.5 GB,  8K context)
  gemma4 → gemma-4-e2b-it  (~2.81 GB, 8K context)

Both set N_GPU_LAYERS=99 since the Tegra iGPU shares system RAM with
the CPU — there is no benefit to partial offload on unified memory.

Also adds config/backends/jetson.json mirroring nvidia.json (same
llama-server contract on port 8080); the runtime difference lives in
docker-compose.jetson.yml which is a follow-up PR.

Tier validation error lists and tier_to_model() switches updated for
both qwen and gemma4 paths so `dream model swap` resolves correctly.

Tests: tier-map suite goes from 122 → 135 PASS (13 new Jetson
assertions covering both profiles, plus the GGUF_URL coverage loop
extension).

Out of scope (separate follow-ups):
  - docker-compose.jetson.yml + resolver branch
  - Auto-tier selection on Jetson hosts (--tier required for now)
  - Orin AGX/NX, Xavier, legacy Nano
  - docs/JETSON-QUICKSTART.md, SUPPORT-MATRIX entry
Phase 3 of issue Light-Heart-Labs#195 milestone 1. Stacks on feat/jetson-tier-map.

Adds the runtime path for NVIDIA Jetson:

  * docker-compose.jetson.yml — new overlay derived from the nvidia
    one with three Jetson-specific changes:
      - Default image dustynv/llama_cpp:r36.4.0 (Jetson-tuned llama.cpp,
        targets sm_87 for Orin Nano); the stock ggml-org image is built
        for sm_75/80/86/89/90 and won't load on Jetson. Override via
        LLAMA_SERVER_IMAGE if a different JetPack release is needed.
      - runtime: nvidia (Tegra container runtime), not the discrete-GPU
        deploy.resources.reservations.devices pattern, which is
        unreliable on L4T.
      - Memory limit defaults sized for 8 GB unified memory (Orin Nano);
        overridable via LLAMA_SERVER_MEMORY_LIMIT.

  * scripts/resolve-compose-stack.sh — new branch for
    gpu_backend == "jetson" or tier == "JETSON_ORIN_NANO", placed
    before the intel/sycl branch so it wins over the nvidia fallthrough.

  * scripts/build-capability-profile.sh — Jetson hosts now resolve to
    [base.yml, jetson.yml] instead of the Phase 2 [base.yml, cpu.yml]
    placeholder.

  * scripts/validate-manifest-schema.sh — gpu_backends enum extended
    to include "jetson" (and "none", which was already used by other
    scripts but missing from the validator).

  * Core service manifests (llama-server, dashboard, dashboard-api,
    open-webui) declare jetson in their gpu_backends so the resolver
    doesn't drop them on Jetson hosts. ComfyUI deliberately stays as
    [amd, nvidia] — no validated arm64+sm_87 path for image gen yet.

  * open-webui sets ENABLE_IMAGE_GENERATION=false by default on Jetson
    since ComfyUI is unavailable.

  * tests/test-jetson-compose-resolver.sh — new fixture-based test
    covering: jetson backend selects jetson overlay, JETSON_ORIN_NANO
    tier alone also selects it, ComfyUI is excluded, and nvidia/amd/cpu
    backends are unchanged. 11/11 PASS.

Existing test regressions: tier-map 135/135, jetson-detection 12/12,
resolve-compose-resilient 31/31 — all unchanged.

Out of scope (separate follow-ups):
  - On-hardware install validation (Phase 4 of Light-Heart-Labs#195 milestone 1)
  - Orin AGX/NX, Xavier, legacy Nano
  - ComfyUI / Whisper GPU acceleration on Jetson
  - docs/JETSON-QUICKSTART.md + SUPPORT-MATRIX entry (Phase 5)
preflight-engine.sh has hard-coded tier→requirement maps for min_disk_gb
and min_ram_gb. Tiers missing from these maps fall back to the generic
50 GB / 16 GB defaults — which are sized for typical NVIDIA tier-2
installs and would have wrongly blocked any Jetson Orin Nano install
with a "Disk 42GB is below required minimum for tier JETSON_ORIN_NANO
(50GB)" error, even though the actual model footprint is ~17 GB.

Adds JETSON_ORIN_NANO entries:

  min_disk_map: 15 GB
    Qwen3.5-2B (~1.5 GB) + dustynv/llama_cpp image (~5 GB) + dashboard
    stack + working space. Same envelope as tier 0 since model size
    dominates.

  min_ram_map:  6 GB
    Orin Nano ships with 8 GB unified memory; usable ~7.6 GB after
    kernel reservation. Requiring 16 GB would warn against the only
    memory configuration this SKU has.

  tier_rank_map: 0
    Aligns with tier 0 for ordering purposes.

  gpu-backend check: explicit jetson branch with an experimental
    warning (referencing Light-Heart-Labs#195) so the preflight doesn't fall through
    to the generic "Unknown backend" warn.

Result for the Jetson Orin Nano scenario (7 GB RAM, 42 GB disk):
  blockers: 1 → 0
  can_proceed: false → true
  disk check: pass with "42GB meets tier JETSON_ORIN_NANO recommendation
  (15GB)"

Regression: existing jetson detection (12/12), tier-map (135/135),
compose-resolver (11/11) tests unchanged.
On a Jetson Orin Nano install, the capability-profile pipeline can
overwrite GPU_BACKEND from "jetson" to "cpu" when the hardware
classifier lacks a Jetson entry (gpu-database.json does not yet know
about Tegra SoCs, so classify-hardware.sh returns class_id=unknown
and llm_backend falls back to cpu via build-capability-profile.sh).

When that override fired in 02-detection.sh, GPU_BACKEND="cpu" met
the resolver's `gpu_backend == "cpu"` branch FIRST in
resolve-compose-stack.sh, short-circuiting before the
`gpu_backend == "jetson" or tier == "JETSON_ORIN_NANO"` branch could
catch it via tier. Result: the installer picked
docker-compose.cpu.yml and started downloading the CPU llama-server
image instead of the dustynv Jetson image, even with
--tier JETSON_ORIN_NANO explicitly requested.

Fix: move the jetson branch above the cpu branch. Tier alone is now
the authoritative signal — even if the capability profile pipeline
mishandles the backend, an explicit --tier JETSON_ORIN_NANO still
selects docker-compose.jetson.yml.

Regression test added: tier=JETSON_ORIN_NANO + gpu-backend=cpu must
still resolve to docker-compose.jetson.yml. Compose resolver test
suite now 12/12 PASS.

Note: this fixes the SYMPTOM (wrong compose overlay selected). The
ROOT CAUSE — capability profile reporting gpu=unknown / backend=cpu
on real Jetson hardware despite the new detect-hardware.sh branch —
is still under investigation. Tier-based fallback is the safety net.
… backends

build-capability-profile.sh runs in two passes:
  1. A gpu_type → (llm_backend, overlays) branch picks the right values
     for known vendors (amd, nvidia, apple, jetson).
  2. An override block applies hw_rec_backend / hw_rec_overlays from
     classify-hardware.sh's output.

Pass 2 was overriding pass 1 unconditionally. classify-hardware.sh is a
data-driven lookup against gpu-database.json which does not yet have
Jetson entries — it returns the default `cpu` backend for any
unrecognized vendor. On a real Orin Nano this produced:

  pass 1: gpu_type=jetson → llm_backend=jetson, overlays=[base, jetson]
  pass 2: hw_rec_backend=cpu → CLOBBERED to llm_backend=cpu, overlays=[base, cpu]

Result during install on Jetson hardware: capability profile shipped
backend=cpu + overlays=[base, cpu], the installer copied that into
GPU_BACKEND, and the resolver picked docker-compose.cpu.yml even with
--tier JETSON_ORIN_NANO. Confirmed in p3-1.log line 50:
  "Capabilities override detection: backend=cpu, memory=unified, tier=T1"

Fix: gate the override block on gpu_type. If gpu_type is one of the
explicitly-handled vendors (amd / nvidia / apple / jetson), the pass-1
assignment is authoritative and the classifier's "I don't know → cpu"
default is ignored. Override still applies when gpu_type isn't known
to us (the safety net case it was designed for).

Also fixes two secondary issues in the same block:
  - vendor whitelist on line 147 lacked "jetson", so the cap profile
    JSON stored gpu.vendor="unknown" even when type was correctly
    detected as jetson
  - tier whitelist on line 126 lacked JETSON_ORIN_NANO, so explicit
    tier values fell back to T1

Verified via simulated Python invocation: with gpu_type=jetson and
hw_rec_backend=cpu (the classify-hardware.sh fallback), the resulting
profile now correctly contains llm_backend=jetson and
overlays=[base.yml, jetson.yml].

Regression: detection 12/12, tier-map 135/135, compose-resolver 12/12,
resolve-compose-resilient 31/31 all pass.
@Lightheartdevs

Copy link
Copy Markdown
Collaborator

Thanks for keeping this draft until the end-to-end Jetson install evidence lands. I checked out feat/jetson-compose-overlay locally and ran two lightweight audits; both currently fail:

  1. python dream-server/tests/test-dependency-pins.py

    docker-compose.jetson.yml:24: variable image ref is not documented: ${LLAMA_SERVER_IMAGE:-dustynv/llama_cpp:r36.4.0}
    docker-compose.jetson.yml:24: image ref is not recorded in dependency-lock.json: dustynv/llama_cpp:r36.4.0
    

    Please add the Jetson image/default to config/dependency-lock.json, or adjust the pin policy if this image should be handled differently.

  2. python dream-server/scripts/audit-extensions.py --project-dir dream-server

    This fails because the core manifests now declare jetson in gpu_backends, but scripts/audit-extensions.py still has VALID_GPU_BACKENDS = {"amd", "nvidia", "apple", "all", "none"}. Adding jetson there should make the extension audit accept these manifest changes.

I also saw no GitHub checks reported for feat/jetson-compose-overlay. These look like straightforward draft blockers before ready-for-review, in addition to the Phase 4 on-hardware install proof already called out in the PR body.

…DA image

installers/phases/08-images.sh built the image-pull list using a
two-tier conditional:

  if GPU_BACKEND == amd  → lemonade image
  elif GPU_BACKEND == cpu → ghcr.io/ggml-org/llama.cpp:server-b8248
  else                   → ghcr.io/ggml-org/llama.cpp:server-cuda-b9014

The `else` branch caught GPU_BACKEND=jetson and pulled the discrete-CUDA
ggml image, which is compiled for sm_75/80/86/89/90 — NOT sm_87 (Orin
Nano Ampere). On a real Jetson install the image pulled cleanly but
would fail to load CUDA kernels at runtime, hanging container init and
stalling the entire compose stack.

This contradicted the docker-compose.jetson.yml default
(dustynv/llama_cpp:r36.4.0) because the pre-pull step sets
LLAMA_SERVER_IMAGE in .env, and that env value then shadows the compose
file's `${LLAMA_SERVER_IMAGE:-dustynv/...}` default — the overlay
default never got a chance to fire.

Fix: add an explicit jetson branch that pulls
dustynv/llama_cpp:r36.4.0 (the Jetson-tuned community image with
sm_87 in its CUDA arch list, matching JetPack 6.x). Aligns the pre-pull
image with what docker-compose.jetson.yml expects.

Also extends the LLAMA_SERVER_IMAGE_FALLBACK validation gate (line ~63)
to include jetson, so an invalid pin gets caught by the same path as
nvidia/cpu/intel/sycl instead of silently passing through.

Verified on real Orin Nano (p3-2.log): the prior install pulled
ggml-org/llama.cpp:server-cuda-b9014 (visible in `docker image prune`
deletion list); the container would hang at init and never become
healthy. With this patch, the pre-pull step requests dustynv directly,
matching the compose overlay's runtime image.

Regression: jetson-detection 12/12, tier-map 135/135,
compose-resolver 12/12, all unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants