TensorRT-RTX Demos

A collection of demos showcasing key TensorRT-RTX features through model pipelines.

Quick Start

Clone and install

We recommend using Python versions between 3.9 and 3.12 inclusive due to supported versions for required dependencies.

git clone https://github.com/NVIDIA/TensorRT-RTX.git
cd TensorRT-RTX

# Install TensorRT-RTX
python -m pip install tensorrt-rtx

# Install demo dependencies (example: Flux 1.dev)
python -m pip install -r demo/flux1.dev/requirements.txt

Run demo

# Standalone Python script
python demo/flux1.dev/flux_demo.py -h

# Interactive Jupyter notebook
jupyter notebook demo/flux1.dev/flux_demo.ipynb

Python Script Usage Examples

The standalone script provides extensive configuration options for various use cases. For detailed walkthroughs, interactive exploration, and comprehensive documentation, see the Flux.1 [dev] Demo Notebook which offers in-depth coverage of TensorRT-RTX features.

GPU Compatibility: This demo is verified on Ada and Blackwell GPUs. See Transformer Precision Options for more compatibility details.

Required Parameters

To download model checkpoints for the FLUX.1 [dev] pipeline, obtain a read access token to the model repository on HuggingFace Hub. See instructions.

--hf-token YOUR_HF_TOKEN                # Hugging Face token with read access to the Flux.1 [dev] model

Image Generation Parameters

--prompt "Your text prompt"              # Text prompt for generation
--height 512                             # Image height (default: 512)
--width 512                              # Image width (default: 512)
--batch-size 1                           # Batch size (default: 1)
--seed 0                                 # Random seed (default: 0)
--num-inference-steps 50                 # Denoising steps (default: 50)
--guidance-scale 3.5                     # Guidance scale (default: 3.5)

Engine & Performance Options

--precision {bf16,fp8,fp4}               # Transformer precision (default: fp8)
--dynamic-shape                          # Enable dynamic shape engines
--enable-runtime-cache                   # Enable runtime caching
--low-vram                               # Enable low VRAM mode
--verbose                                # Enable verbose logging

Cache Management

--cache-dir ./demo_cache                 # Cache directory (default: ./demo_cache)
--cache-mode {full,lean}                 # Cache mode (default: full)

Example Commands for Flux.1 [dev] Pipeline

Default Parameters Image Generation:

python demo/flux1.dev/flux_demo.py --hf-token YOUR_TOKEN

Large Image Generation (1024x1024):

python demo/flux1.dev/flux_demo.py --hf-token YOUR_TOKEN --height 1024 --width 1024 --prompt "A detailed cityscape at golden hour"

Faster JIT Compilation Times with Runtime Caching:

python demo/flux1.dev/flux_demo.py --hf-token YOUR_TOKEN --enable-runtime-cache --prompt "A cat meanders down a dimly lit alleyway in a large city."

Dynamic-Shape Engines with Shape-Specialized Kernels:

python demo/flux1.dev/flux_demo.py --hf-token YOUR_TOKEN --dynamic-shape --prompt "A dramatic cityscape from a dazzling angle"

Low VRAM + FP4 Quantized (for Blackwell GPUs with memory constraints):

python demo/flux1.dev/flux_demo.py --hf-token YOUR_TOKEN --low-vram --precision fp4 --prompt "A serene forest scene"

Tip: The Jupyter notebook provides interactive parameter exploration, detailed explanations of each feature, and additional use cases.

Key Features

Smart Caching: Shared models across pipelines with intelligent cleanup
Cross-Platform: Works on Windows and Linux
Flexible Precision: Configure transformer model precision (bf16, fp8, fp4)
Memory Management: Low-VRAM mode for memory-constrained GPUs
Dynamic Shapes: Support for flexible input dimensions with runtime optimization

Notable Configuration Options

Transformer Precision Options

Choose based on your GPU architecture and VRAM requirements:

Precision	Supported GPU Architecture	Approx. Max VRAM Usage
Precision	Supported GPU Architecture	Default	--low-vram
BF16	Ampere, Ada, Blackwell	32.1 GB	23.1 GB
FP8	Ada, Blackwell	21.6 GB	12.0 GB
FP4	Blackwell	20.5 GB	11.0 GB

# Configure precision when loading engines
pipeline.load_engines(transformer_precision="fp8")  # Default: fp8

Input Shape Modes

# Static shapes (default)
pipeline.load_engines(opt_height=512, opt_width=512, shape_mode="static")

# Dynamic shapes (flexible resolutions without recompilation)
pipeline.load_engines(opt_height=512, opt_width=512, shape_mode="dynamic")

GPU Memory Management

# Default (fastest, more VRAM usage)
pipeline = Pipeline(..., low_vram=False)

# Low VRAM mode (slower, less VRAM usage)
pipeline = Pipeline(..., low_vram=True)

Disk Memory Management

full (default): Keep all cached models
lean: Auto-cleanup unused models to save disk space

Cache Structure

Models and engines are stored in a shared cache by model_id and precision:

demo_cache/
├── shared/
│   ├── onnx/{model_id}/{precision}/           # ONNX models
│   └── engines/{model_id}/{precision}/        # TensorRT engines
├── runtime.cache                              # JIT compilation cache
└── .cache_state.json                          # Usage tracking

Troubleshooting

Image Quality Issues

Ensure the dimensions are multiples of 16
Try altering the seed and guidance_scale parameters
See Flux.1 [dev] Demo Notebook for more tips and examples

GPU Out of Memory

Use low_vram=True to reduce VRAM usage
Use enable_runtime_cache=False or omit the --enable-runtime-cache flag
Try lower precision: fp8 (Ada/Blackwell) or fp4 (Blackwell only)
Reduce batch size or image resolution

Disk Space Issues

Use cache_mode="lean" to reduce disk usage by automatically cleaning up unused models
Manually delete demo cache directory

Build Errors

Verify TensorRT-RTX and dependencies are installed (see Quick Start)
Ensure the precision being used is supported by the GPU architecture (see Support Matrix)

Running Tests

To configure the test environment and run demo tests, refer to the test README.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRT-RTX Demos

Quick Start

Python Script Usage Examples

Required Parameters

Image Generation Parameters

Engine & Performance Options

Cache Management

Example Commands for Flux.1 [dev] Pipeline

Key Features

Notable Configuration Options

Transformer Precision Options

Input Shape Modes

GPU Memory Management

Disk Memory Management

Cache Structure

Troubleshooting

Running Tests

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

TensorRT-RTX Demos

Quick Start

Python Script Usage Examples

Required Parameters

Image Generation Parameters

Engine & Performance Options

Cache Management

Example Commands for Flux.1 [dev] Pipeline

Key Features

Notable Configuration Options

Transformer Precision Options

Input Shape Modes

GPU Memory Management

Disk Memory Management

Cache Structure

Troubleshooting

Running Tests