feat(tier-map): add JETSON_ORIN_NANO tier and jetson backend by matedev01 · Pull Request #1481 · Light-Heart-Labs/DreamServer

matedev01 · 2026-05-27T02:49:07Z

Summary

Phase 2 of issue #195 milestone 1. Stacks on #1479 (Phase 1 detection). Adds the JETSON_ORIN_NANO tier and a jetson backend contract — pure data, no compose changes, no runtime impact.

Why

PR #1479 plumbs GPU_BACKEND=jetson through detection but the tier-map has no entry, so a user running --tier JETSON_ORIN_NANO today gets error "Invalid tier". This PR closes that gap and lets the installer resolve a sane model and context size for Orin Nano hardware.

Tier choices

Sized for the 8 GB unified-memory budget on Orin Nano (Ampere sm_87). Leaves ~5 GB free for KV cache + open-webui + dashboard-api + LiteLLM after the model loads.

Profile	Model	Quant	Size on disk	Context	Rationale
qwen	qwen3.5-2b	Q4_K_M	~1.5 GB	8K	Same model as Tier 0 — proven small-footprint default; comfortable headroom
gemma4	gemma-4-e2b-it	Q4_K_M	~2.81 GB	8K	Matches the gemma4 Tier 1 model; reduced context vs Tier 1 to keep KV cache room

Both set N_GPU_LAYERS=99 — the Tegra iGPU shares system RAM with the CPU, so partial offload has no upside on unified memory.

Larger Orin Nano variants and Orin NX/AGX are deliberately excluded; this milestone is "Orin Nano end-to-end" per the maintainer's 2026-05-10 triage on #195.

Files changed

File	Change
`installers/lib/tier-map.sh`	`JETSON_ORIN_NANO` case added to both `set_qwen_tier_config()` and `set_gemma4_tier_config()` switches. Validation error lists at lines 194 + 307 updated. `tier_to_model()` extended in both profile branches so `dream model swap` resolves correctly
`config/backends/jetson.json` (new)	Mirrors `nvidia.json` — same `llama-server` contract on port 8080, same provider URL. The runtime difference (JetPack-pinned image, Tegra container runtime) lives in `docker-compose.jetson.yml`, follow-up PR
`tests/test-tier-map.sh`	Added 13 assertions covering both qwen and gemma4 paths for the new tier; extended the GGUF_URL coverage loop to include `JETSON_ORIN_NANO`

Test plan

bash tests/test-tier-map.sh
# Results: 135 passed, 0 failed   (was 122 before; +13 new Jetson assertions)

Plus the Phase 1 detection test still passes unchanged:

bash tests/test-jetson-detection.sh
# Passed: 12, Failed: 0

End-to-end resolution check on the qwen path:

TIER=JETSON_ORIN_NANO bash -c 'source installers/lib/tier-map.sh && resolve_tier_config && echo "$LLM_MODEL $GGUF_FILE $MAX_CONTEXT $N_GPU_LAYERS"'
# Expected: qwen3.5-2b Qwen3.5-2B-Q4_K_M.gguf 8192 99

And gemma4:

MODEL_PROFILE=gemma4 TIER=JETSON_ORIN_NANO bash -c 'source installers/lib/tier-map.sh && resolve_tier_config && echo "$LLM_MODEL $GGUF_FILE $MAX_CONTEXT $N_GPU_LAYERS"'
# Expected: gemma-4-e2b-it gemma-4-E2B-it-Q4_K_M.gguf 8192 99

Explicitly out of scope (separate follow-ups)

docker-compose.jetson.yml and resolver branch in scripts/resolve-compose-stack.sh
Auto-tier selection on Jetson hosts (--tier JETSON_ORIN_NANO required for now)
Orin AGX/NX, Xavier, legacy Nano (different SoCs / CUDA caps)
docs/JETSON-QUICKSTART.md and SUPPORT-MATRIX.md entry
CI smoke (no Jetson hardware available in GH Actions)

Stack note

Stacks on #1479 (feat/jetson-detection). If #1479 needs rework during review (e.g. renaming the backend value from jetson to tegra), this branch rebases cleanly — the only cross-PR dependency is the string jetson used as the backend identifier.

Phase 2 of issue Light-Heart-Labs#195 milestone 1. Stacks on feat/jetson-detection. Adds the JETSON_ORIN_NANO tier for both qwen and gemma4 profiles in installers/lib/tier-map.sh, with conservative model selection sized for the Orin Nano 8GB unified-memory budget: qwen → qwen3.5-2b (~1.5 GB, 8K context) gemma4 → gemma-4-e2b-it (~2.81 GB, 8K context) Both set N_GPU_LAYERS=99 since the Tegra iGPU shares system RAM with the CPU — there is no benefit to partial offload on unified memory. Also adds config/backends/jetson.json mirroring nvidia.json (same llama-server contract on port 8080); the runtime difference lives in docker-compose.jetson.yml which is a follow-up PR. Tier validation error lists and tier_to_model() switches updated for both qwen and gemma4 paths so `dream model swap` resolves correctly. Tests: tier-map suite goes from 122 → 135 PASS (13 new Jetson assertions covering both profiles, plus the GGUF_URL coverage loop extension). Out of scope (separate follow-ups): - docker-compose.jetson.yml + resolver branch - Auto-tier selection on Jetson hosts (--tier required for now) - Orin AGX/NX, Xavier, legacy Nano - docs/JETSON-QUICKSTART.md, SUPPORT-MATRIX entry

matedev01 · 2026-05-27T02:57:40Z

On-hardware verification log

Same Orin Nano 8GB Super (JetPack R36.4.7) as #1479. All five Phase 2 checks pass cleanly: 135/135 tier-map tests, both profile resolutions return the expected models, the new jetson backend contract loads with the right fields, and invalid tiers are rejected with JETSON_ORIN_NANO listed as a valid choice in the error.

Full P2 log — click to expand

aihpc@ubuntu:~/orin/DreamServer/dream-server$ bash tests/test-tier-map.sh 2>&1 | tail -8
  PASS: SELECTOR_CONTEXT
  PASS: SELECTOR_RUNTIME_PROFILE
  PASS: SELECTOR_N_CPU_MOE
  PASS: SELECTOR_CACHE_V

===============================
Results: 135 passed, 0 failed
===============================

aihpc@ubuntu:~/orin/DreamServer/dream-server$ TIER=JETSON_ORIN_NANO bash -c '
  error() { echo "ERROR: $*" >&2; return 1; }
  source lib/safe-env.sh
  source installers/lib/tier-map.sh
  resolve_tier_config
  echo "TIER_NAME    = $TIER_NAME"
  echo "LLM_MODEL    = $LLM_MODEL"
  echo "GGUF_FILE    = $GGUF_FILE"
  echo "MAX_CONTEXT  = $MAX_CONTEXT"
  echo "N_GPU_LAYERS = $N_GPU_LAYERS"
  echo "MODEL_PROFILE_EFFECTIVE = $MODEL_PROFILE_EFFECTIVE"
'
TIER_NAME    = Jetson Orin Nano
LLM_MODEL    = qwen3.5-2b
GGUF_FILE    = Qwen3.5-2B-Q4_K_M.gguf
MAX_CONTEXT  = 8192
N_GPU_LAYERS = 99
MODEL_PROFILE_EFFECTIVE = qwen

aihpc@ubuntu:~/orin/DreamServer/dream-server$ MODEL_PROFILE=gemma4 TIER=JETSON_ORIN_NANO bash -c '
  error() { echo "ERROR: $*" >&2; return 1; }
  source lib/safe-env.sh
  source installers/lib/tier-map.sh
  resolve_tier_config
  echo "TIER_NAME   = $TIER_NAME"
  echo "LLM_MODEL   = $LLM_MODEL"
  echo "GGUF_FILE   = $GGUF_FILE"
  echo "MAX_CONTEXT = $MAX_CONTEXT"
  echo "MODEL_PROFILE_EFFECTIVE = $MODEL_PROFILE_EFFECTIVE"
'
TIER_NAME   = Jetson Orin Nano
LLM_MODEL   = gemma-4-e2b-it
GGUF_FILE   = gemma-4-E2B-it-Q4_K_M.gguf
MAX_CONTEXT = 8192
MODEL_PROFILE_EFFECTIVE = gemma4

aihpc@ubuntu:~/orin/DreamServer/dream-server$ python3 -m json.tool < config/backends/jetson.json
{
    "id": "jetson",
    "llm_engine": "llama-server",
    "service_name": "llama-server",
    "public_api_port": 8080,
    "public_health_url": "http://localhost:8080/health",
    "provider_name": "local-llama",
    "provider_url": "http://llama-server:8080/v1",
    "notes": "NVIDIA Jetson (Tegra) backend. Same llama-server contract as nvidia.json; the runtime difference (JetPack-pinned llama.cpp image, sm_87 for Orin Nano, Tegra container runtime) lives in docker-compose.jetson.yml, which is added in a follow-up to issue #195."
}

aihpc@ubuntu:~/orin/DreamServer/dream-server$ bash scripts/load-backend-contract.sh --backend jetson --env
BACKEND_CONTRACT_ID="jetson"
BACKEND_LLM_ENGINE="llama-server"
BACKEND_SERVICE_NAME="llama-server"
BACKEND_PUBLIC_API_PORT="8080"
BACKEND_PUBLIC_HEALTH_URL="http://localhost:8080/health"
BACKEND_PROVIDER_NAME="local-llama"
BACKEND_PROVIDER_URL="http://llama-server:8080/v1"
BACKEND_CONTRACT_FILE="/home/aihpc/orin/DreamServer/dream-server/config/backends/jetson.json"
BACKEND_LEMONADE_CONTAINER_IMAGE=""
BACKEND_LEMONADE_WINDOWS_VERSION=""
BACKEND_LEMONADE_WINDOWS_MSI_FILE=""
BACKEND_LEMONADE_WINDOWS_EXECUTABLE=""
BACKEND_LEMONADE_API_PORT=""
BACKEND_LEMONADE_HEALTH_PATH=""
BACKEND_LEMONADE_LINUX_BACKEND=""
BACKEND_LEMONADE_WINDOWS_BACKEND=""

aihpc@ubuntu:~/orin/DreamServer/dream-server$ TIER=DEFINITELY_NOT_A_TIER bash -c '
  error() { echo "ERROR: $*" >&2; return 1; }
  source lib/safe-env.sh
  source installers/lib/tier-map.sh
  resolve_tier_config || echo "(correctly rejected — see ERROR line above)"
'
ERROR: Invalid tier: DEFINITELY_NOT_A_TIER. Valid tiers: 0, 1, 2, 3, 4, CLOUD, NV_ULTRA, SH_LARGE, SH_COMPACT, ARC, ARC_LITE, JETSON_ORIN_NANO
(correctly rejected — see ERROR line above)

Reviewer notes

Tier-map suite goes from 122 → 135 PASS on real aarch64 hardware — the 13 new assertions covering both qwen and gemma4 paths and the extended GGUF_URL coverage loop all hold.
Both profiles resolve the right model. Qwen path: qwen3.5-2b Q4_K_M (~1.5 GB) at 8K context. Gemma4 path: gemma-4-e2b-it Q4_K_M (~2.81 GB) at 8K context. Both set N_GPU_LAYERS=99 for full offload to the Tegra iGPU since unified memory makes partial offload pointless.
Backend contract loads correctly. load-backend-contract.sh --backend jetson --env emits BACKEND_CONTRACT_ID=jetson, BACKEND_PUBLIC_API_PORT=8080, BACKEND_PROVIDER_URL=http://llama-server:8080/v1 — same llama-server contract as nvidia.json, which is intentional; the runtime delta lives in the follow-up docker-compose.jetson.yml.
Invalid tier still fails cleanly. The error message now lists JETSON_ORIN_NANO as a valid choice, confirming both the new tier in set_qwen_tier_config() / set_gemma4_tier_config() and the updated validation error string at tier-map.sh:194 / tier-map.sh:307.

Lightheartdevs · 2026-05-27T18:50:18Z

Nice narrow follow-up. The tier and backend contract choices look coherent for the 8 GB unified-memory Orin Nano target, and keeping compose/runtime out of this PR helps the review.

Two process notes:

This stacks on feat(detection): add NVIDIA Jetson (Tegra) hardware detection #1479 but targets main, so the GitHub diff includes the detection changes too. Easiest path may be to merge/review feat(detection): add NVIDIA Jetson (Tegra) hardware detection #1479 first, then review this PR against updated main.
GitHub currently reports no checks for feat/jetson-tier-map, so I would want tests/test-tier-map.sh and the backend contract load path run in a maintainer/Linux CI environment before merge.

No functional blocker from my pass on this PR by itself.

matedev01 added 2 commits May 27, 2026 02:52

feat: detect aarch64 orin nano

3321bd6

matedev01 mentioned this pull request May 27, 2026

feat(compose): add docker-compose.jetson.yml + resolver branch #1482

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tier-map): add JETSON_ORIN_NANO tier and jetson backend#1481

feat(tier-map): add JETSON_ORIN_NANO tier and jetson backend#1481
matedev01 wants to merge 2 commits into
Light-Heart-Labs:mainfrom
matedev01:feat/jetson-tier-map

matedev01 commented May 27, 2026

Uh oh!

matedev01 commented May 27, 2026

Uh oh!

Lightheartdevs commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

matedev01 commented May 27, 2026

Summary

Why

Tier choices

Files changed

Test plan

Explicitly out of scope (separate follow-ups)

Stack note

Uh oh!

matedev01 commented May 27, 2026

On-hardware verification log

Reviewer notes

Uh oh!

Lightheartdevs commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants