Skip to content

feat(tier-map): add JETSON_ORIN_NANO tier and jetson backend#1481

Open
matedev01 wants to merge 2 commits into
Light-Heart-Labs:mainfrom
matedev01:feat/jetson-tier-map
Open

feat(tier-map): add JETSON_ORIN_NANO tier and jetson backend#1481
matedev01 wants to merge 2 commits into
Light-Heart-Labs:mainfrom
matedev01:feat/jetson-tier-map

Conversation

@matedev01

Copy link
Copy Markdown
Contributor

Summary

Phase 2 of issue #195 milestone 1. Stacks on #1479 (Phase 1 detection). Adds the JETSON_ORIN_NANO tier and a jetson backend contract — pure data, no compose changes, no runtime impact.

Why

PR #1479 plumbs GPU_BACKEND=jetson through detection but the tier-map has no entry, so a user running --tier JETSON_ORIN_NANO today gets error "Invalid tier". This PR closes that gap and lets the installer resolve a sane model and context size for Orin Nano hardware.

Tier choices

Sized for the 8 GB unified-memory budget on Orin Nano (Ampere sm_87). Leaves ~5 GB free for KV cache + open-webui + dashboard-api + LiteLLM after the model loads.

Profile Model Quant Size on disk Context Rationale
qwen qwen3.5-2b Q4_K_M ~1.5 GB 8K Same model as Tier 0 — proven small-footprint default; comfortable headroom
gemma4 gemma-4-e2b-it Q4_K_M ~2.81 GB 8K Matches the gemma4 Tier 1 model; reduced context vs Tier 1 to keep KV cache room

Both set N_GPU_LAYERS=99 — the Tegra iGPU shares system RAM with the CPU, so partial offload has no upside on unified memory.

Larger Orin Nano variants and Orin NX/AGX are deliberately excluded; this milestone is "Orin Nano end-to-end" per the maintainer's 2026-05-10 triage on #195.

Files changed

File Change
installers/lib/tier-map.sh JETSON_ORIN_NANO case added to both set_qwen_tier_config() and set_gemma4_tier_config() switches. Validation error lists at lines 194 + 307 updated. tier_to_model() extended in both profile branches so dream model swap resolves correctly
config/backends/jetson.json (new) Mirrors nvidia.json — same llama-server contract on port 8080, same provider URL. The runtime difference (JetPack-pinned image, Tegra container runtime) lives in docker-compose.jetson.yml, follow-up PR
tests/test-tier-map.sh Added 13 assertions covering both qwen and gemma4 paths for the new tier; extended the GGUF_URL coverage loop to include JETSON_ORIN_NANO

Test plan

bash tests/test-tier-map.sh
# Results: 135 passed, 0 failed   (was 122 before; +13 new Jetson assertions)

Plus the Phase 1 detection test still passes unchanged:

bash tests/test-jetson-detection.sh
# Passed: 12, Failed: 0

End-to-end resolution check on the qwen path:

TIER=JETSON_ORIN_NANO bash -c 'source installers/lib/tier-map.sh && resolve_tier_config && echo "$LLM_MODEL $GGUF_FILE $MAX_CONTEXT $N_GPU_LAYERS"'
# Expected: qwen3.5-2b Qwen3.5-2B-Q4_K_M.gguf 8192 99

And gemma4:

MODEL_PROFILE=gemma4 TIER=JETSON_ORIN_NANO bash -c 'source installers/lib/tier-map.sh && resolve_tier_config && echo "$LLM_MODEL $GGUF_FILE $MAX_CONTEXT $N_GPU_LAYERS"'
# Expected: gemma-4-e2b-it gemma-4-E2B-it-Q4_K_M.gguf 8192 99

Explicitly out of scope (separate follow-ups)

  • docker-compose.jetson.yml and resolver branch in scripts/resolve-compose-stack.sh
  • Auto-tier selection on Jetson hosts (--tier JETSON_ORIN_NANO required for now)
  • Orin AGX/NX, Xavier, legacy Nano (different SoCs / CUDA caps)
  • docs/JETSON-QUICKSTART.md and SUPPORT-MATRIX.md entry
  • CI smoke (no Jetson hardware available in GH Actions)

Stack note

Stacks on #1479 (feat/jetson-detection). If #1479 needs rework during review (e.g. renaming the backend value from jetson to tegra), this branch rebases cleanly — the only cross-PR dependency is the string jetson used as the backend identifier.

matedev01 added 2 commits May 27, 2026 02:52
Phase 2 of issue Light-Heart-Labs#195 milestone 1. Stacks on feat/jetson-detection.

Adds the JETSON_ORIN_NANO tier for both qwen and gemma4 profiles in
installers/lib/tier-map.sh, with conservative model selection sized for
the Orin Nano 8GB unified-memory budget:

  qwen   → qwen3.5-2b      (~1.5 GB,  8K context)
  gemma4 → gemma-4-e2b-it  (~2.81 GB, 8K context)

Both set N_GPU_LAYERS=99 since the Tegra iGPU shares system RAM with
the CPU — there is no benefit to partial offload on unified memory.

Also adds config/backends/jetson.json mirroring nvidia.json (same
llama-server contract on port 8080); the runtime difference lives in
docker-compose.jetson.yml which is a follow-up PR.

Tier validation error lists and tier_to_model() switches updated for
both qwen and gemma4 paths so `dream model swap` resolves correctly.

Tests: tier-map suite goes from 122 → 135 PASS (13 new Jetson
assertions covering both profiles, plus the GGUF_URL coverage loop
extension).

Out of scope (separate follow-ups):
  - docker-compose.jetson.yml + resolver branch
  - Auto-tier selection on Jetson hosts (--tier required for now)
  - Orin AGX/NX, Xavier, legacy Nano
  - docs/JETSON-QUICKSTART.md, SUPPORT-MATRIX entry
@matedev01

Copy link
Copy Markdown
Contributor Author

On-hardware verification log

Same Orin Nano 8GB Super (JetPack R36.4.7) as #1479. All five Phase 2 checks pass cleanly: 135/135 tier-map tests, both profile resolutions return the expected models, the new jetson backend contract loads with the right fields, and invalid tiers are rejected with JETSON_ORIN_NANO listed as a valid choice in the error.

Full P2 log — click to expand
aihpc@ubuntu:~/orin/DreamServer/dream-server$ bash tests/test-tier-map.sh 2>&1 | tail -8
  PASS: SELECTOR_CONTEXT
  PASS: SELECTOR_RUNTIME_PROFILE
  PASS: SELECTOR_N_CPU_MOE
  PASS: SELECTOR_CACHE_V

===============================
Results: 135 passed, 0 failed
===============================

aihpc@ubuntu:~/orin/DreamServer/dream-server$ TIER=JETSON_ORIN_NANO bash -c '
  error() { echo "ERROR: $*" >&2; return 1; }
  source lib/safe-env.sh
  source installers/lib/tier-map.sh
  resolve_tier_config
  echo "TIER_NAME    = $TIER_NAME"
  echo "LLM_MODEL    = $LLM_MODEL"
  echo "GGUF_FILE    = $GGUF_FILE"
  echo "MAX_CONTEXT  = $MAX_CONTEXT"
  echo "N_GPU_LAYERS = $N_GPU_LAYERS"
  echo "MODEL_PROFILE_EFFECTIVE = $MODEL_PROFILE_EFFECTIVE"
'
TIER_NAME    = Jetson Orin Nano
LLM_MODEL    = qwen3.5-2b
GGUF_FILE    = Qwen3.5-2B-Q4_K_M.gguf
MAX_CONTEXT  = 8192
N_GPU_LAYERS = 99
MODEL_PROFILE_EFFECTIVE = qwen

aihpc@ubuntu:~/orin/DreamServer/dream-server$ MODEL_PROFILE=gemma4 TIER=JETSON_ORIN_NANO bash -c '
  error() { echo "ERROR: $*" >&2; return 1; }
  source lib/safe-env.sh
  source installers/lib/tier-map.sh
  resolve_tier_config
  echo "TIER_NAME   = $TIER_NAME"
  echo "LLM_MODEL   = $LLM_MODEL"
  echo "GGUF_FILE   = $GGUF_FILE"
  echo "MAX_CONTEXT = $MAX_CONTEXT"
  echo "MODEL_PROFILE_EFFECTIVE = $MODEL_PROFILE_EFFECTIVE"
'
TIER_NAME   = Jetson Orin Nano
LLM_MODEL   = gemma-4-e2b-it
GGUF_FILE   = gemma-4-E2B-it-Q4_K_M.gguf
MAX_CONTEXT = 8192
MODEL_PROFILE_EFFECTIVE = gemma4

aihpc@ubuntu:~/orin/DreamServer/dream-server$ python3 -m json.tool < config/backends/jetson.json
{
    "id": "jetson",
    "llm_engine": "llama-server",
    "service_name": "llama-server",
    "public_api_port": 8080,
    "public_health_url": "http://localhost:8080/health",
    "provider_name": "local-llama",
    "provider_url": "http://llama-server:8080/v1",
    "notes": "NVIDIA Jetson (Tegra) backend. Same llama-server contract as nvidia.json; the runtime difference (JetPack-pinned llama.cpp image, sm_87 for Orin Nano, Tegra container runtime) lives in docker-compose.jetson.yml, which is added in a follow-up to issue #195."
}

aihpc@ubuntu:~/orin/DreamServer/dream-server$ bash scripts/load-backend-contract.sh --backend jetson --env
BACKEND_CONTRACT_ID="jetson"
BACKEND_LLM_ENGINE="llama-server"
BACKEND_SERVICE_NAME="llama-server"
BACKEND_PUBLIC_API_PORT="8080"
BACKEND_PUBLIC_HEALTH_URL="http://localhost:8080/health"
BACKEND_PROVIDER_NAME="local-llama"
BACKEND_PROVIDER_URL="http://llama-server:8080/v1"
BACKEND_CONTRACT_FILE="/home/aihpc/orin/DreamServer/dream-server/config/backends/jetson.json"
BACKEND_LEMONADE_CONTAINER_IMAGE=""
BACKEND_LEMONADE_WINDOWS_VERSION=""
BACKEND_LEMONADE_WINDOWS_MSI_FILE=""
BACKEND_LEMONADE_WINDOWS_EXECUTABLE=""
BACKEND_LEMONADE_API_PORT=""
BACKEND_LEMONADE_HEALTH_PATH=""
BACKEND_LEMONADE_LINUX_BACKEND=""
BACKEND_LEMONADE_WINDOWS_BACKEND=""

aihpc@ubuntu:~/orin/DreamServer/dream-server$ TIER=DEFINITELY_NOT_A_TIER bash -c '
  error() { echo "ERROR: $*" >&2; return 1; }
  source lib/safe-env.sh
  source installers/lib/tier-map.sh
  resolve_tier_config || echo "(correctly rejected — see ERROR line above)"
'
ERROR: Invalid tier: DEFINITELY_NOT_A_TIER. Valid tiers: 0, 1, 2, 3, 4, CLOUD, NV_ULTRA, SH_LARGE, SH_COMPACT, ARC, ARC_LITE, JETSON_ORIN_NANO
(correctly rejected — see ERROR line above)

Reviewer notes

  • Tier-map suite goes from 122 → 135 PASS on real aarch64 hardware — the 13 new assertions covering both qwen and gemma4 paths and the extended GGUF_URL coverage loop all hold.
  • Both profiles resolve the right model. Qwen path: qwen3.5-2b Q4_K_M (~1.5 GB) at 8K context. Gemma4 path: gemma-4-e2b-it Q4_K_M (~2.81 GB) at 8K context. Both set N_GPU_LAYERS=99 for full offload to the Tegra iGPU since unified memory makes partial offload pointless.
  • Backend contract loads correctly. load-backend-contract.sh --backend jetson --env emits BACKEND_CONTRACT_ID=jetson, BACKEND_PUBLIC_API_PORT=8080, BACKEND_PROVIDER_URL=http://llama-server:8080/v1 — same llama-server contract as nvidia.json, which is intentional; the runtime delta lives in the follow-up docker-compose.jetson.yml.
  • Invalid tier still fails cleanly. The error message now lists JETSON_ORIN_NANO as a valid choice, confirming both the new tier in set_qwen_tier_config() / set_gemma4_tier_config() and the updated validation error string at tier-map.sh:194 / tier-map.sh:307.

@Lightheartdevs

Copy link
Copy Markdown
Collaborator

Nice narrow follow-up. The tier and backend contract choices look coherent for the 8 GB unified-memory Orin Nano target, and keeping compose/runtime out of this PR helps the review.

Two process notes:

No functional blocker from my pass on this PR by itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants