A collection of demos showcasing key TensorRT-RTX features through model pipelines.
-
Clone and install
We recommend using Python versions between 3.9 and 3.12 inclusive due to supported versions for required dependencies.
git clone https://github.com/NVIDIA/TensorRT-RTX.git cd TensorRT-RTX # Install TensorRT-RTX python -m pip install tensorrt-rtx # Install demo dependencies (example: Flux 1.dev) python -m pip install -r demo/flux1.dev/requirements.txt
-
Run demo
# Standalone Python script python demo/flux1.dev/flux_demo.py -h # Interactive Jupyter notebook jupyter notebook demo/flux1.dev/flux_demo.ipynb
The standalone script provides extensive configuration options for various use cases. For detailed walkthroughs, interactive exploration, and comprehensive documentation, see the Flux.1 [dev] Demo Notebook which offers in-depth coverage of TensorRT-RTX features.
GPU Compatibility: This demo is verified on Ada and Blackwell GPUs. See Transformer Precision Options for more compatibility details.
To download model checkpoints for the FLUX.1 [dev] pipeline, obtain a read access token to the model repository on HuggingFace Hub. See instructions.
--hf-token YOUR_HF_TOKEN # Hugging Face token with read access to the Flux.1 [dev] model--prompt "Your text prompt" # Text prompt for generation
--height 512 # Image height (default: 512)
--width 512 # Image width (default: 512)
--batch-size 1 # Batch size (default: 1)
--seed 0 # Random seed (default: 0)
--num-inference-steps 50 # Denoising steps (default: 50)
--guidance-scale 3.5 # Guidance scale (default: 3.5)--precision {bf16,fp8,fp4} # Transformer precision (default: fp8)
--dynamic-shape # Enable dynamic shape engines
--enable-runtime-cache # Enable runtime caching
--low-vram # Enable low VRAM mode
--verbose # Enable verbose logging--cache-dir ./demo_cache # Cache directory (default: ./demo_cache)
--cache-mode {full,lean} # Cache mode (default: full)Default Parameters Image Generation:
python demo/flux1.dev/flux_demo.py --hf-token YOUR_TOKENLarge Image Generation (1024x1024):
python demo/flux1.dev/flux_demo.py --hf-token YOUR_TOKEN --height 1024 --width 1024 --prompt "A detailed cityscape at golden hour"Faster JIT Compilation Times with Runtime Caching:
python demo/flux1.dev/flux_demo.py --hf-token YOUR_TOKEN --enable-runtime-cache --prompt "A cat meanders down a dimly lit alleyway in a large city."Dynamic-Shape Engines with Shape-Specialized Kernels:
python demo/flux1.dev/flux_demo.py --hf-token YOUR_TOKEN --dynamic-shape --prompt "A dramatic cityscape from a dazzling angle"Low VRAM + FP4 Quantized (for Blackwell GPUs with memory constraints):
python demo/flux1.dev/flux_demo.py --hf-token YOUR_TOKEN --low-vram --precision fp4 --prompt "A serene forest scene"Tip: The Jupyter notebook provides interactive parameter exploration, detailed explanations of each feature, and additional use cases.
- Smart Caching: Shared models across pipelines with intelligent cleanup
- Cross-Platform: Works on Windows and Linux
- Flexible Precision: Configure transformer model precision (bf16, fp8, fp4)
- Memory Management: Low-VRAM mode for memory-constrained GPUs
- Dynamic Shapes: Support for flexible input dimensions with runtime optimization
Choose based on your GPU architecture and VRAM requirements:
| Precision | Supported GPU Architecture | Approx. Max VRAM Usage | |
|---|---|---|---|
| Default | --low-vram | ||
| BF16 | Ampere, Ada, Blackwell | 32.1 GB | 23.1 GB |
| FP8 | Ada, Blackwell | 21.6 GB | 12.0 GB |
| FP4 | Blackwell | 20.5 GB | 11.0 GB |
# Configure precision when loading engines
pipeline.load_engines(transformer_precision="fp8") # Default: fp8# Static shapes (default)
pipeline.load_engines(opt_height=512, opt_width=512, shape_mode="static")
# Dynamic shapes (flexible resolutions without recompilation)
pipeline.load_engines(opt_height=512, opt_width=512, shape_mode="dynamic")# Default (fastest, more VRAM usage)
pipeline = Pipeline(..., low_vram=False)
# Low VRAM mode (slower, less VRAM usage)
pipeline = Pipeline(..., low_vram=True)full(default): Keep all cached modelslean: Auto-cleanup unused models to save disk space
Models and engines are stored in a shared cache by model_id and precision:
demo_cache/
├── shared/
│ ├── onnx/{model_id}/{precision}/ # ONNX models
│ └── engines/{model_id}/{precision}/ # TensorRT engines
├── runtime.cache # JIT compilation cache
└── .cache_state.json # Usage tracking
Image Quality Issues
- Ensure the dimensions are multiples of 16
- Try altering the
seedandguidance_scaleparameters - See Flux.1 [dev] Demo Notebook for more tips and examples
GPU Out of Memory
- Use
low_vram=Trueto reduce VRAM usage - Use
enable_runtime_cache=Falseor omit the--enable-runtime-cacheflag - Try lower precision:
fp8(Ada/Blackwell) orfp4(Blackwell only) - Reduce batch size or image resolution
Disk Space Issues
- Use
cache_mode="lean"to reduce disk usage by automatically cleaning up unused models - Manually delete demo cache directory
Build Errors
- Verify TensorRT-RTX and dependencies are installed (see Quick Start)
- Ensure the precision being used is supported by the GPU architecture (see Support Matrix)
To configure the test environment and run demo tests, refer to the test README.