GR00T Deployment & Inference Guide

Run inference with PyTorch or TensorRT acceleration for the GR00T N1.7 policy.

Prerequisites

Model checkpoint: nvidia/GR00T-N1.7-3B
Dataset in LeRobot format (e.g., demo_data/libero_demo)
CUDA-enabled GPU
Setup uv environment following README.md

Platform	Installation
dGPU (H100, A100, RTX 4090/5090, L20, RTX Pro 5000/6000, etc.)	`uv sync` — GPU deps (`flash-attn`, `onnx`, `tensorrt`) included
Jetson Thor	Jetson Thor Setup (Docker or bare metal)
DGX Spark	DGX Spark Setup (Docker or bare metal)
Jetson Orin	Jetson Orin Setup (Docker or bare metal)

dGPU local environment: use the installation commands below, then use the PyTorch or TensorRT commands in this guide
Thor Docker or bare metal: skip to Jetson Thor Setup
Spark Docker or bare metal: skip to DGX Spark Setup
Orin Docker or bare metal: skip to Jetson Orin Setup

dGPU Installation

uv sync

GPU dependencies (flash-attn, onnx, tensorrt) are included in the default install.

Download Model and Dataset

Download the finetuned model to a local directory (HuggingFace does not support nested repo paths directly):

uv run hf download nvidia/GR00T-N1.7-LIBERO \
  --include "libero_10/config.json" "libero_10/embodiment_id.json" \
  "libero_10/model-*.safetensors" "libero_10/model.safetensors.index.json" \
  "libero_10/processor_config.json" "libero_10/statistics.json" \
  --local-dir checkpoints/GR00T-N1.7-LIBERO

For demo dataset setup, see the Data Format section in the main README.

Quick Start: PyTorch Inference

Run inference on demo trajectories using PyTorch (no TRT setup needed):

uv run python scripts/deployment/standalone_inference_script.py \
  --model-path checkpoints/GR00T-N1.7-LIBERO/libero_10 \
  --dataset-path demo_data/libero_demo \
  --embodiment-tag LIBERO_PANDA \
  --traj-ids 0 1 2 3 4 \
  --inference-mode pytorch \
  --action-horizon 8

TensorRT Acceleration

The trt_full_pipeline mode (passed via --inference-mode trt_full_pipeline in standalone_inference_script.py) accelerates all model components with TRT engines. Speedup varies by platform — see benchmark tables below for measured results on each device. The same pipeline is referred to as n17_full_pipeline inside the engine-loading and build scripts (trt_model_forward.py, build_trt_pipeline.py); the two names describe the same set of engines.

Component	Engine	Notes
ViT	TRT	Qwen3-VL Vision (24 blocks, FP32 for accuracy)
LLM	TRT	Qwen3-VL Text Model (16 layers, with deepstack injection)
VL Self-Attention	TRT	SelfAttentionTransformer (4 layers, if present)
State Encoder	TRT	CategorySpecificMLP
Action Encoder	TRT	MultiEmbodimentActionEncoder
DiT	TRT	AlternateVLDiT (32 layers)
Action Decoder	TRT	CategorySpecificMLP

Lightweight ops remain in PyTorch: embed_tokens, masked_scatter, get_rope_index, VLLN.

DiT-only mode (legacy from N1.6)

The dit_only export mode (--export-mode dit_only) optimizes only the action head DiT, leaving the backbone in PyTorch. This was the default in N1.6. For N1.7, full_pipeline is recommended as it accelerates the backbone (ViT + LLM) which dominates inference time.

Build TRT Engines

The unified build_trt_pipeline.py script runs all steps (export ONNX → build engines → verify accuracy → benchmark) in a single command:

uv run python scripts/deployment/build_trt_pipeline.py \
  --model-path checkpoints/GR00T-N1.7-LIBERO/libero_10 \
  --dataset-path demo_data/libero_demo \
  --embodiment-tag LIBERO_PANDA

Finetuned models: Replace --model-path with your checkpoint path. The pipeline is identical for base and finetuned models.

Note: Engine build takes ~2-5 minutes depending on GPU. Engines are GPU-architecture-specific and must be rebuilt for different GPUs.

Batch size: The --batch-size value is baked as a static dimension into the ONNX and TRT models. Engines built with one batch size cannot be used with a different batch size at runtime. If you need a different batch size, re-run the full pipeline (--steps export,build,verify) with the new --batch-size value.

You can also run a subset of steps:

# Export + build only (skip verify and benchmark)
uv run python scripts/deployment/build_trt_pipeline.py \
  --model-path checkpoints/GR00T-N1.7-LIBERO/libero_10 \
  --dataset-path demo_data/libero_demo \
  --embodiment-tag LIBERO_PANDA \
  --steps export,build

What each step does

The pipeline runs 4 steps in sequence:

Export to ONNX (export) — Exports all model components (LLM, VL Self-Attention, State Encoder, Action Encoder, DiT, Action Decoder) to ONNX format under <output-dir>/onnx/.
Build TensorRT Engines (build) — Compiles each ONNX model into a GPU-specific TensorRT engine under <output-dir>/engines/.
Verify Accuracy (verify) — Runs PyTorch vs TRT output comparison. Expected: Cosine Similarity: 0.999+ (PASS).
Benchmark (benchmark) — Measures E2E latency for PyTorch Eager, torch.compile, and TRT modes.

Each step can be run individually via --steps <step>. Verbose logs are written to <output-dir>/pipeline.log.

Performance

Benchmark Results

GR00T N1.7 Inference Timing (4 denoising steps, 1 camera):

Device	Mode	Data Processing	Backbone	Action Head	E2E	Frequency	E2E Speedup
dGPU
H100 80GB HBM3	PyTorch Eager	6.2 ms	31.3 ms	48.2 ms	85.8 ms	11.7 Hz	1.00x
	torch.compile	6.2 ms	30.4 ms	12.0 ms	48.6 ms	20.6 Hz	1.77x
	TensorRT (Full Pipeline)	6.2 ms	8.8 ms	12.3 ms	27.9 ms	35.9 Hz	3.08x
H20 96GB HBM3	PyTorch Eager	5.33 ms	30.8 ms	47.3 ms	83.4 ms	12.0 Hz	1.00x
	torch.compile	5.33 ms	31.1 ms	13.3 ms	49.7 ms	20.1 Hz	1.68x
	TensorRT (Full Pipeline)	5.33 ms	14.2 ms	14.5 ms	34.0 ms	29.4 Hz	2.45x
RTX Pro 6000 Blackwell	PyTorch Eager	4.8 ms	29.3 ms	44.0 ms	78.4 ms	12.8 Hz	1.00x
	torch.compile	4.8 ms	29.4 ms	16.5 ms	50.7 ms	19.7 Hz	1.55x
	TensorRT (Full Pipeline)	4.8 ms	9.9 ms	13.2 ms	27.9 ms	35.9 Hz	2.81x
RTX Pro 5000 72GB	PyTorch Eager	8.85 ms	54.01 ms	63.19 ms	126.4 ms	7.9 Hz	1.00x
	torch.compile	8.85 ms	55.74 ms	20.38 ms	84.9 ms	11.8 Hz	1.49x
	TensorRT (Full Pipeline)	8.85 ms	14.37 ms	17.33 ms	40.5 ms	24.7 Hz	3.13x
L40	PyTorch Eager	6.6 ms	42.8 ms	78.9 ms	128.3 ms	7.8 Hz	1.00x
	torch.compile	6.6 ms	42.7 ms	19.8 ms	69.0 ms	14.5 Hz	1.86x
	TensorRT (Full Pipeline)	6.6 ms	13.1 ms	18.8 ms	38.4 ms	26.0 Hz	3.34x
L20	PyTorch Eager	5.7 ms	47.58 ms	86.92 ms	140.3 ms	7.1 Hz	1.00x
	torch.compile	5.7 ms	47.2 ms	20.18 ms	73.1 ms	13.7 Hz	1.92x
	TensorRT (Full Pipeline)	5.7 ms	17.27 ms	19.79 ms	42.8 ms	23.3 Hz	3.28x
Jetson / Spark
DGX Spark	PyTorch Eager	13.14 ms	38.22 ms	74.94 ms	126.4 ms	7.9 Hz	1.00x
	torch.compile	13.14 ms	39.23 ms	56.49 ms	108.8 ms	9.2 Hz	1.16x
	TensorRT (Full Pipeline)	13.14 ms	33.43 ms	52.37 ms	98.6 ms	10.1 Hz	1.28x
AGX Thor	PyTorch Eager	8.21 ms	55.26 ms	81.65 ms	144.9 ms	6.9 Hz	1.00x
	torch.compile	8.21 ms	55.59 ms	64.66 ms	128.4 ms	7.8 Hz	1.13x
	TensorRT (Full Pipeline)	8.21 ms	28.89 ms	56.64 ms	93.8 ms	10.7 Hz	1.54x
Orin	PyTorch Eager	9.45 ms	127.6 ms	205.39 ms	342.8 ms	2.9 Hz	1.00x
	torch.compile	9.45 ms	128.59 ms	78.94 ms	217.0 ms	4.6 Hz	1.58x
	TensorRT (DiT-only)	9.45 ms	128.38 ms	78.6 ms	216.5 ms	4.6 Hz	1.58x

Note: Orin uses DiT-only TensorRT (--inference-mode tensorrt) because TRT 10.3 does not support the backbone engine. All other platforms use the full pipeline (--inference-mode trt_full_pipeline).

Raw benchmark output (H100 80GB HBM3)

Hardware: NVIDIA H100 80GB HBM3
Model: checkpoints/GR00T-N1.7-LIBERO/libero_10
1 camera, Denoising Steps: 4

PyTorch Eager:
  E2E:             85.8 ms (11.7 Hz)
  Data Processing: 6.2 ms | Backbone: 31.3 ms | Action Head: 48.2 ms

torch.compile:
  E2E:             48.6 ms (20.6 Hz), 1.77x speedup
  Data Processing: 6.2 ms | Backbone: 30.4 ms | Action Head: 12.0 ms

TensorRT (Full Pipeline):
  E2E:             27.9 ms (35.9 Hz), 3.08x speedup
  Data Processing: 6.2 ms | Backbone: 8.8 ms  | Action Head: 12.3 ms

Standalone Inference with TRT

The standalone inference script serves as both an accuracy validation and a reference for deploying TRT inference in your own code. It runs per-step inference on real trajectories and compares action predictions:

uv run python scripts/deployment/standalone_inference_script.py \
  --model-path checkpoints/GR00T-N1.7-LIBERO/libero_10 \
  --dataset-path demo_data/libero_demo \
  --embodiment-tag LIBERO_PANDA \
  --traj-ids 0 1 2 3 4 \
  --inference-mode trt_full_pipeline \
  --trt-engine-path ./gr00t_trt_deployment/engines \
  --save-plot-path ./output/trt_inference.png

Expected accuracy: MSE/MAE match PyTorch within noise. TRT produces identical action quality. Speedup varies by platform — run build_trt_pipeline.py --steps benchmark on your hardware for exact numbers.

Optional: LIBERO Closed-Loop Sim Evaluation

To validate TRT accuracy in end-to-end robotic tasks, run the LIBERO closed-loop evaluation. This requires a separate environment setup (~10-30 min, MuJoCo simulator + dependencies).

Setup, commands, and results (H100, 20 episodes)

Task: KITCHEN_SCENE3_turn_on_the_stove_and_put_the_moka_pot_on_it, 20 episodes:

Mode	Success Rate
PyTorch	100% (20/20)
TRT (n17_full_pipeline)	95% (19/20)

Difference is within simulation noise (p >> 0.05).

Note: Use --n-envs 1 for TRT evaluation (ViT engine has static shapes for single-observation inference).

# One-time LIBERO setup (~10 min)
bash gr00t/eval/sim/LIBERO/setup_libero.sh

# Activate LIBERO venv and install additional deps
source gr00t/eval/sim/LIBERO/libero_uv/.venv/bin/activate
uv pip install diffusers transformers accelerate safetensors torchcodec

# TRT full pipeline evaluation
python gr00t/eval/rollout_policy.py \
  --model-path checkpoints/GR00T-N1.7-LIBERO/libero_10 \
  --env-name "libero_sim/KITCHEN_SCENE3_turn_on_the_stove_and_put_the_moka_pot_on_it" \
  --n-episodes 20 --n-envs 1 --max-episode-steps 504 \
  --trt-engine-path ./gr00t_trt_deployment/engines \
  --trt-mode n17_full_pipeline

Run python scripts/deployment/build_trt_pipeline.py --steps benchmark to generate benchmarks for your hardware.

Platform-Specific Setup

Jetson and Spark platforms use different dependency stacks than dGPU. Thor and Spark use CUDA 13 with PyTorch 2.10.0 from the Jetson AI Lab cu130 index. Orin uses CUDA 12.6 with PyTorch 2.10.0 from the Jetson AI Lab cu126 index.

Jetson Thor Setup

Thor uses CUDA 13 and Python 3.12, which require a different dependency stack than x86 or Orin. Tested with JetPack 7.1. There are two ways to run on Thor: Docker (recommended) or bare metal.

Docker (Recommended)

Build the Thor container from the repo root:

cd docker && bash build.sh --profile=thor && cd ..

Download the finetuned model (run once, on the host):

uv run hf download nvidia/GR00T-N1.7-LIBERO --include "libero_10/config.json" "libero_10/embodiment_id.json" "libero_10/model-*.safetensors" "libero_10/model.safetensors.index.json" "libero_10/processor_config.json" "libero_10/statistics.json" --local-dir checkpoints/GR00T-N1.7-LIBERO

Start an interactive Docker session (recommended for multi-step TRT work):

docker run -it --rm --runtime nvidia --gpus all \
  --ipc=host \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  --network host \
  -v "$(pwd)":/workspace/repo \
  -v "${HF_HOME:-${HOME}/.cache/huggingface}":/root/.cache/huggingface \
  -w /workspace/repo \
  -e HF_TOKEN="${HF_TOKEN:-}" \
  gr00t-thor \
  bash

Then inside the container, run the full TRT pipeline (export, build, verify, benchmark):

python scripts/deployment/build_trt_pipeline.py \
  --model-path checkpoints/GR00T-N1.7-LIBERO/libero_10 \
  --dataset-path demo_data/libero_demo \
  --embodiment-tag LIBERO_PANDA

Bare Metal

# One-time install (temporarily copies the Thor pyproject.toml and uv.lock to repo root,
# installs NVPL libs, uv, Python deps, and builds torchcodec from source against the
# system FFmpeg runtime)
bash scripts/deployment/thor/install_deps.sh

# In each new shell
source .venv/bin/activate
source scripts/activate_thor.sh

Then run the TRT pipeline or PyTorch inference as shown in the TensorRT Acceleration and Quick Start sections above. The activation script exports the PyTorch and CUDA library/include paths that torchcodec and torch.compile need on Thor.

DGX Spark Setup

Spark uses CUDA 13 and Python 3.12 like Thor, but requires a dedicated dependency stack and source-built flash-attn for sm121. There are two ways to run on Spark: Docker (recommended) or bare metal.

Docker (Recommended)

Build the Spark container from the repo root:

cd docker && bash build.sh --profile=spark && cd ..

Download the finetuned model (run once, on the host):

uv run hf download nvidia/GR00T-N1.7-LIBERO --include "libero_10/config.json" "libero_10/embodiment_id.json" "libero_10/model-*.safetensors" "libero_10/model.safetensors.index.json" "libero_10/processor_config.json" "libero_10/statistics.json" --local-dir checkpoints/GR00T-N1.7-LIBERO

Start an interactive Docker session (recommended for multi-step TRT work):

docker run -it --rm --runtime nvidia --gpus all \
  --ipc=host \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  --network host \
  -v "$(pwd)":/workspace/repo \
  -v "${HF_HOME:-${HOME}/.cache/huggingface}":/root/.cache/huggingface \
  -w /workspace/repo \
  -e HF_TOKEN="${HF_TOKEN:-}" \
  gr00t-spark \
  bash

Then inside the container, run the full TRT pipeline (export, build, verify, benchmark):

python scripts/deployment/build_trt_pipeline.py \
  --model-path checkpoints/GR00T-N1.7-LIBERO/libero_10 \
  --dataset-path demo_data/libero_demo \
  --embodiment-tag LIBERO_PANDA

Bare Metal

# One-time install (temporarily copies the Spark pyproject.toml and uv.lock to repo root,
# installs NVPL libs, uv, Python deps, source-builds flash-attn for sm121, and builds
# torchcodec from source against the system FFmpeg runtime)
bash scripts/deployment/spark/install_deps.sh

# In each new shell
source .venv/bin/activate
source scripts/activate_spark.sh

Then run the TRT pipeline or PyTorch inference as shown in the TensorRT Acceleration and Quick Start sections above. If you later rerun uv sync, rerun bash scripts/deployment/spark/install_deps.sh so the Spark-specific flash-attn build is restored and revalidated.

Jetson Orin Setup

Note: On Orin, only the DiT (action head) TRT export is currently supported. Use --export-mode dit_only instead of full_pipeline. Full pipeline support is in progress.

Orin uses CUDA 12.6 and Python 3.10 (JetPack 6.2), which require a different dependency stack than x86 or Thor. Tested with JetPack 6.2. There are two ways to run on Orin: Docker (recommended) or bare metal.

Docker (Recommended)

Build the Orin container from the repo root:

cd docker && bash build.sh --profile=orin && cd ..

Download the finetuned model (run once, on the host):

uv run hf download nvidia/GR00T-N1.7-LIBERO --include "libero_10/config.json" "libero_10/embodiment_id.json" "libero_10/model-*.safetensors" "libero_10/model.safetensors.index.json" "libero_10/processor_config.json" "libero_10/statistics.json" --local-dir checkpoints/GR00T-N1.7-LIBERO

Start an interactive Docker session (recommended for multi-step TRT work):

docker run -it --rm --runtime nvidia --gpus all \
  --ipc=host \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  --network host \
  -v "$(pwd)":/workspace/repo \
  -v "${HF_HOME:-${HOME}/.cache/huggingface}":/root/.cache/huggingface \
  -w /workspace/repo \
  -e HF_TOKEN="${HF_TOKEN:-}" \
  gr00t-orin \
  bash

Then inside the container, run the TRT pipeline (DiT-only on Orin):

python scripts/deployment/build_trt_pipeline.py \
  --model-path checkpoints/GR00T-N1.7-LIBERO/libero_10 \
  --dataset-path demo_data/libero_demo \
  --embodiment-tag LIBERO_PANDA \
  --export-mode dit_only

Bare Metal

# One-time install (temporarily copies the Orin pyproject.toml and uv.lock to repo root,
# installs uv, Python deps, and builds torchcodec from source against JetPack's FFmpeg
# runtime)
bash scripts/deployment/orin/install_deps.sh

# In each new shell
source .venv/bin/activate
source scripts/activate_orin.sh

Then run the TRT pipeline (with --export-mode dit_only) or PyTorch inference as shown in the TensorRT Acceleration and Quick Start sections above. The activation script exports the PyTorch and CUDA library/include paths that torchcodec and torch.compile need on Orin.

Orin storage tip: If your eMMC root is low on space, redirect the HuggingFace cache to an NVMe SSD with export HF_HOME=/path/to/ssd/.cache/huggingface before downloading models.

Orin TRT limitations: TRT 10.3 on Orin does not support the backbone (LLM) engine — the build step will report a failure for llm_bf16.engine and that is expected. The remaining 6 engines build successfully. Use --export-mode action_head for verification and --inference-mode tensorrt (DiT-only TRT, backbone runs in PyTorch) for inference:
python scripts/deployment/build_trt_pipeline.py \
  --model-path checkpoints/GR00T-N1.7-LIBERO/libero_10 \
  --dataset-path demo_data/libero_demo \
  --export-mode action_head \
  --steps verify

python scripts/deployment/standalone_inference_script.py \
  --model-path checkpoints/GR00T-N1.7-LIBERO/libero_10 \
  --dataset-path demo_data/libero_demo \
  --embodiment-tag LIBERO_PANDA \
  --traj-ids 0 \
  --inference-mode tensorrt \
  --trt-engine-path ./gr00t_n1d7_engines

Command-Line Arguments

`build_trt_pipeline.py`

Argument	Default	Description
`--model-path`	(required)	Path to model checkpoint
`--dataset-path`	`demo_data/libero_demo`	Path to dataset (LeRobot format)
`--embodiment-tag`	Auto-detected	Embodiment tag (auto-detected from processor_config.json if single embodiment)
`--output-dir`	`./gr00t_trt_deployment`	Root output directory. ONNX → `<output-dir>/onnx/`, engines → `<output-dir>/engines/`
`--precision`	`bf16`	Precision for ONNX export and TRT engine build (`bf16`, `fp16`, `fp32`)
`--batch-size`	`1`	Batch size baked into exported ONNX/TRT models (static — see note below)
`--export-mode`	`full_pipeline`	Export mode: `dit_only`, `action_head`, or `full_pipeline`
`--video-backend`	`torchcodec`	Video backend for dataset loading
`--workspace`	`8192`	TRT builder workspace size in MB
`--num-iterations`	`20`	Number of benchmark iterations
`--warmup`	`5`	Number of warmup iterations
`--skip-compile`	`false`	Skip torch.compile benchmark
`--steps`	`all`	Steps to run: `all` or comma-separated subset of `export,build,verify,benchmark`
`--log-file`	`<output-dir>/pipeline.log`	Log file path

`standalone_inference_script.py`

Argument	Default	Description
`--model-path`	(required)	Path to model checkpoint
`--dataset-path`	`demo_data/droid_sample`	Path to dataset (LeRobot format)
`--embodiment-tag`	Auto-detected	Robot embodiment tag
`--traj-ids`	`[0]`	Episode indices to evaluate (space-separated)
`--steps`	`200`	Max steps per trajectory (capped by actual length)
`--action-horizon`	`16`	Action prediction horizon
`--inference-mode`	`pytorch`	`pytorch`, `tensorrt` (DiT-only TRT), or `trt_full_pipeline` (all engines)
`--trt-engine-path`	`./gr00t_n1d7_engines`	Directory containing pre-built TRT engines
`--denoising-steps`	`4`	Diffusion denoising iterations
`--save-plot-path`	`None`	Save per-trajectory GT-vs-predicted comparison plots
`--video-backend`	`torchcodec`	Video decoder: `torchcodec`, `decord`, or `torchvision_av`
`--skip-timing-steps`	`1`	Initial steps excluded from timing stats (warmup)
`--host` / `--port`	`127.0.0.1` / `5555`	Server address (when using client mode without `--model-path`)
`--seed`	`42`	Random seed for reproducibility

Files

File	Description
`build_trt_pipeline.py`	Unified pipeline: export ONNX, build engines, verify, benchmark
`standalone_inference_script.py`	Main inference script (PyTorch + DiT-only TensorRT)
`trt_torch.py`	TRT Engine wrapper class (load, bind, execute)
`trt_model_forward.py`	TRT forward functions and setup (backbone + action head)

Troubleshooting

Engine Build Fails

Ensure you have enough GPU memory (16GB+ recommended for full pipeline)
Try reducing workspace size: --workspace 4096
Ensure TensorRT version matches your CUDA version
LLM engine requires batch_size dimension handling when using custom shape profiles

ONNX Export Issues

If export fails with COMPLEX128 error: ensure _simple_causal_mask is used (not HuggingFace's create_causal_mask)
If masked_scatter size assertion fails: ensure visual_pos_masks has the correct number of True values matching deepstack tensor size
Check that the dataset path is valid and contains at least one trajectory

Accuracy Issues

If cosine < 0.99: check that LLM export does NOT include the final RMSNorm (backbone returns pre-norm hidden_states[-1])
If output magnitude is ~12x too small: this is the norm bug — see above
Run build_trt_pipeline.py --steps verify --export-mode action_head first to isolate backbone vs action head drift

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GR00T Deployment & Inference Guide

Prerequisites

dGPU Installation

Download Model and Dataset

Quick Start: PyTorch Inference

TensorRT Acceleration

Build TRT Engines

Performance

Benchmark Results

Standalone Inference with TRT

Optional: LIBERO Closed-Loop Sim Evaluation

Platform-Specific Setup

Jetson Thor Setup

DGX Spark Setup

Jetson Orin Setup

Command-Line Arguments

`build_trt_pipeline.py`

`standalone_inference_script.py`

Files

Troubleshooting

Engine Build Fails

ONNX Export Issues

Accuracy Issues

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

GR00T Deployment & Inference Guide

Prerequisites

dGPU Installation

Download Model and Dataset

Quick Start: PyTorch Inference

TensorRT Acceleration

Build TRT Engines

Performance

Benchmark Results

Standalone Inference with TRT

Optional: LIBERO Closed-Loop Sim Evaluation

Platform-Specific Setup

Jetson Thor Setup

DGX Spark Setup

Jetson Orin Setup

Command-Line Arguments

build_trt_pipeline.py

standalone_inference_script.py

Files

Troubleshooting

Engine Build Fails

ONNX Export Issues

Accuracy Issues

`build_trt_pipeline.py`

`standalone_inference_script.py`