Skip to content

Commit 17003e4

Browse files
authored
TensorRT 10.7-GA OSS Release (#4269)
Signed-off-by: Kevin Chen <[email protected]>
1 parent c468d67 commit 17003e4

File tree

81 files changed

+1411
-530
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

81 files changed

+1411
-530
lines changed

.gitmodules

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,4 @@
99
[submodule "parsers/onnx"]
1010
path = parsers/onnx
1111
url = https://github.com/onnx/onnx-tensorrt.git
12-
branch = release/10.6-GA
12+
branch = release/10.7-GA

CHANGELOG.md

+25
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,30 @@
11
# TensorRT OSS Release Changelog
22

3+
## 10.7.0 GA - 2024-12-4
4+
Key Feature and Updates:
5+
6+
- Demo Changes
7+
- demoDiffusion
8+
- Enabled low-vram for the Flux pipeline. Users can now run the pipelines on systems with 32GB VRAM.
9+
- Added support for [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) pipeline.
10+
- Enabled weight streaming mode for Flux pipeline.
11+
12+
- Plugin Changes
13+
- On Blackwell and later platforms, TensorRT will drop cuDNN support on the following categories of plugins
14+
- User-written `IPluginV2Ext`, `IPluginV2DynamicExt`, and `IPluginV2IOExt` plugins that are dependent on cuDNN handles provided by TensorRT (via the `attachToContext()` API).
15+
- TensorRT standard plugins that use cuDNN, specifically:
16+
- `InstanceNormalization_TRT` (version: 1, 2, and 3) present in `plugin/instanceNormalizationPlugin/`.
17+
- `GroupNormalizationPlugin` (version: 1) present in `plugin/groupNormalizationPlugin/`.
18+
- Note: These normalization plugins are superseded by TensorRT’s native `INormalizationLayer` ([C++](https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_normalization_layer.html), [Python](https://docs.nvidia.com/deeplearning/tensorrt/operators/docs/Normalization.html)). TensorRT support for cuDNN-dependent plugins remain unchanged on pre-Blackwell platforms.
19+
20+
- Parser Changes
21+
- Now prioritizes using plugins over local functions when a corresponding plugin is available in the registry.
22+
- Added dynamic axes support for `Squeeze` and `Unsqueeze` operations.
23+
- Added support for parsing mixed-precision `BatchNormalization` nodes in strongly-typed mode.
24+
25+
- Addressed Issues
26+
- Fixed [4113](https://github.com/NVIDIA/TensorRT/issues/4113).
27+
328
## 10.6.0 GA - 2024-11-05
429
Key Feature and Updates:
530
- Demo Changes

README.md

+9-9
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ You can skip the **Build** section to enjoy TensorRT with Python.
2626
To build the TensorRT-OSS components, you will first need the following software packages.
2727

2828
**TensorRT GA build**
29-
* TensorRT v10.6.0.26
29+
* TensorRT v10.7.0.23
3030
* Available from direct download links listed below
3131

3232
**System Packages**
@@ -73,25 +73,25 @@ To build the TensorRT-OSS components, you will first need the following software
7373
If using the TensorRT OSS build container, TensorRT libraries are preinstalled under `/usr/lib/x86_64-linux-gnu` and you may skip this step.
7474

7575
Else download and extract the TensorRT GA build from [NVIDIA Developer Zone](https://developer.nvidia.com) with the direct links below:
76-
- [TensorRT 10.6.0.26 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.6.0/tars/TensorRT-10.6.0.26.Linux.x86_64-gnu.cuda-11.8.tar.gz)
77-
- [TensorRT 10.6.0.26 for CUDA 12.6, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.6.0/tars/TensorRT-10.6.0.26.Linux.x86_64-gnu.cuda-12.6.tar.gz)
78-
- [TensorRT 10.6.0.26 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.6.0/zip/TensorRT-10.6.0.26.Windows.win10.cuda-11.8.zip)
79-
- [TensorRT 10.6.0.26 for CUDA 12.6, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.6.0/zip/TensorRT-10.6.0.26.Windows.win10.cuda-12.6.zip)
76+
- [TensorRT 10.7.0.23 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.7.0/tars/TensorRT-10.7.0.23.Linux.x86_64-gnu.cuda-11.8.tar.gz)
77+
- [TensorRT 10.7.0.23 for CUDA 12.6, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.7.0/tars/TensorRT-10.7.0.23.Linux.x86_64-gnu.cuda-12.6.tar.gz)
78+
- [TensorRT 10.7.0.23 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.7.0/zip/TensorRT-10.7.0.23.Windows.win10.cuda-11.8.zip)
79+
- [TensorRT 10.7.0.23 for CUDA 12.6, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.7.0/zip/TensorRT-10.7.0.23.Windows.win10.cuda-12.6.zip)
8080

8181

8282
**Example: Ubuntu 20.04 on x86-64 with cuda-12.6**
8383

8484
```bash
8585
cd ~/Downloads
86-
tar -xvzf TensorRT-10.6.0.26.Linux.x86_64-gnu.cuda-12.6.tar.gz
87-
export TRT_LIBPATH=`pwd`/TensorRT-10.6.0.26
86+
tar -xvzf TensorRT-10.7.0.23.Linux.x86_64-gnu.cuda-12.6.tar.gz
87+
export TRT_LIBPATH=`pwd`/TensorRT-10.7.0.23
8888
```
8989

9090
**Example: Windows on x86-64 with cuda-12.6**
9191

9292
```powershell
93-
Expand-Archive -Path TensorRT-10.6.0.26.Windows.win10.cuda-12.6.zip
94-
$env:TRT_LIBPATH="$pwd\TensorRT-10.6.0.26\lib"
93+
Expand-Archive -Path TensorRT-10.7.0.23.Windows.win10.cuda-12.6.zip
94+
$env:TRT_LIBPATH="$pwd\TensorRT-10.7.0.23\lib"
9595
```
9696

9797
## Setting Up The Build Environment

VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
10.6.0.26
1+
10.7.0.23

demo/BERT/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ The following software version configuration has been tested:
7575
|Software|Version|
7676
|--------|-------|
7777
|Python|>=3.8|
78-
|TensorRT|10.6.0.26|
78+
|TensorRT|10.7.0.23|
7979
|CUDA|12.6|
8080

8181
## Setup

demo/Diffusion/README.md

+28-7
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ This demo application ("demoDiffusion") showcases the acceleration of Stable Dif
77
### Clone the TensorRT OSS repository
88

99
```bash
10-
git clone [email protected]:NVIDIA/TensorRT.git -b release/10.5 --single-branch
10+
git clone [email protected]:NVIDIA/TensorRT.git -b release/10.7 --single-branch
1111
cd TensorRT
1212
```
1313

@@ -16,7 +16,7 @@ cd TensorRT
1616
Install nvidia-docker using [these intructions](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker).
1717

1818
```bash
19-
docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:24.07-py3 /bin/bash
19+
docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:24.10-py3 /bin/bash
2020
```
2121

2222
NOTE: The demo supports CUDA>=11.8
@@ -43,12 +43,12 @@ pip3 install -r requirements.txt
4343

4444
> NOTE: demoDiffusion has been tested on systems with NVIDIA H100, A100, L40, T4, and RTX4090 GPUs, and the following software configuration.
4545
```
46-
diffusers 0.30.2
46+
diffusers 0.31.0
4747
onnx 1.15.0
4848
onnx-graphsurgeon 0.5.2
4949
onnxruntime 1.16.3
5050
polygraphy 0.49.9
51-
tensorrt 10.6.0.26
51+
tensorrt 10.7.0.23
5252
tokenizers 0.13.3
5353
torch 2.2.0
5454
transformers 4.42.2
@@ -66,6 +66,7 @@ python3 demo_img2img.py --help
6666
python3 demo_inpaint.py --help
6767
python3 demo_controlnet.py --help
6868
python3 demo_txt2img_xl.py --help
69+
python3 demo_txt2img_flux.py --help
6970
```
7071

7172
### HuggingFace user access token
@@ -257,23 +258,43 @@ python3 demo_stable_cascade.py --onnx-opset=16 "Anthropomorphic cat dressed as a
257258
258259
### Generate an image guided by a text prompt using Flux
259260

261+
Run the below command to generate an image with FLUX.1 Dev in FP16.
262+
260263
```bash
261264
python3 demo_txt2img_flux.py "a beautiful photograph of Mt. Fuji during cherry blossom" --hf-token=$HF_TOKEN
262265
```
263266

264-
Run the below command to generate an image with FLUX in BF16.
267+
Run the below command to generate an image with FLUX.1 Dev in BF16.
265268

266269
```bash
267270
python3 demo_txt2img_flux.py "a beautiful photograph of Mt. Fuji during cherry blossom" --hf-token=$HF_TOKEN --bf16
268271
```
269272

270-
Run the below command to generate an image with FLUX in FP8. (FP8 is only supppoted on Hopper.)
273+
Run the below command to generate an image with FLUX.1 Dev in FP8. (FP8 is suppported on Hopper and Ada.)
271274

272275
```bash
273276
python3 demo_txt2img_flux.py "a beautiful photograph of Mt. Fuji during cherry blossom" --hf-token=$HF_TOKEN --fp8
274277
```
275278

276-
NOTE: Running the Flux pipeline requires 80GB of GPU memory or higher
279+
Run the below command to generate an image with FLUX.1 Schnell in FP16.
280+
281+
```bash
282+
python3 demo_txt2img_flux.py "a beautiful photograph of Mt. Fuji during cherry blossom" --hf-token=$HF_TOKEN --version="flux.1-schnell"
283+
```
284+
285+
Run the below command to generate an image with FLUX.1 Schnell in BF16.
286+
287+
```bash
288+
python3 demo_txt2img_flux.py "a beautiful photograph of Mt. Fuji during cherry blossom" --hf-token=$HF_TOKEN --version="flux.1-schnell" --bf16
289+
```
290+
291+
Run the below command to generate an image with FLUX.1 Schnell in FP8. (FP8 is suppported on Hopper and Ada.)
292+
293+
```bash
294+
python3 demo_txt2img_flux.py "a beautiful photograph of Mt. Fuji during cherry blossom" --hf-token=$HF_TOKEN --version="flux.1-schnell" --fp8
295+
```
296+
297+
NOTE: Running the FLUX.1 Dev or FLUX.1 Schnell pipeline requires 48GB or 24GB of GPU memory or higher, respectively.
277298

278299
## Configuration options
279300
- Noise scheduler can be set using `--scheduler <scheduler>`. Note: not all schedulers are available for every version.

demo/Diffusion/demo_txt2img_flux.py

+63-12
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,12 @@
2020
from cuda import cudart
2121

2222
from flux_pipeline import FluxPipeline
23-
from utilities import PIPELINE_TYPE, add_arguments, process_pipeline_args
23+
from utilities import (
24+
PIPELINE_TYPE,
25+
add_arguments,
26+
process_pipeline_args,
27+
VALID_OPTIMIZATION_LEVELS,
28+
)
2429

2530

2631
def parse_args():
@@ -32,7 +37,7 @@ def parse_args():
3237
"--version",
3338
type=str,
3439
default="flux.1-dev",
35-
choices=["flux.1-dev"],
40+
choices=("flux.1-dev", "flux.1-schnell"),
3641
help="Version of Flux",
3742
)
3843
parser.add_argument(
@@ -65,20 +70,48 @@ def parse_args():
6570
parser.add_argument(
6671
"--max_sequence_length",
6772
type=int,
68-
default=512,
69-
help="Maximum sequence length to use with the prompt",
73+
help="Maximum sequence length to use with the prompt. Can be up to 512 for the dev and 256 for the schnell variant.",
7074
)
7175
parser.add_argument(
72-
"--bf16",
73-
action='store_true',
74-
help="Run pipeline in BFloat16 precision"
76+
"--bf16", action="store_true", help="Run pipeline in BFloat16 precision"
7577
)
7678
parser.add_argument(
7779
"--low-vram",
80+
action="store_true",
81+
help="Optimize for low VRAM usage, possibly at the expense of inference performance. Disabled by default.",
82+
)
83+
parser.add_argument(
84+
"--optimization-level",
85+
type=int,
86+
default=3,
87+
help=f"Set the builder optimization level to build the engine with. A higher level allows TensorRT to spend more building time for more optimization options. Must be one of {VALID_OPTIMIZATION_LEVELS}.",
88+
)
89+
parser.add_argument(
90+
"--torch-fallback",
91+
default=None,
92+
type=str,
93+
help="Name list of models to be inferenced using torch instead of TRT. For example --torch-fallback t5,transformer. If --torch-inference set, this parameter will be ignored."
94+
)
95+
96+
parser.add_argument(
97+
"--ws",
7898
action='store_true',
79-
help="Optimize for low VRAM usage, possibly at the expense of inference performance. Disabled by default."
99+
help="Build TensorRT engines with weight streaming enabled."
80100
)
81101

102+
parser.add_argument(
103+
"--t5-ws-percentage",
104+
type=int,
105+
default=None,
106+
help="Set runtime weight streaming budget as the percentage of the size of streamable weights for the T5 model. This argument only takes effect when --ws is set. 0 streams the most weights and 100 or None streams no weights. "
107+
)
108+
109+
parser.add_argument(
110+
"--transformer-ws-percentage",
111+
type=int,
112+
default=None,
113+
help="Set runtime weight streaming budget as the percentage of the size of streamable weights for the transformer model. This argument only takes effect when --ws is set. 0 streams the most weights and 100 or None streams no weights."
114+
)
82115
return parser.parse_args()
83116

84117

@@ -100,10 +133,24 @@ def process_demo_args(args):
100133
if len(prompt2) == 1:
101134
prompt2 = prompt2 * batch_size
102135

103-
if args.max_sequence_length is not None and args.max_sequence_length > 512:
104-
raise ValueError(
105-
f"`max_sequence_length` cannot be greater than 512 but is {args.max_sequence_length}"
106-
)
136+
max_seq_supported_by_model = {
137+
"flux.1-schnell": 256,
138+
"flux.1-dev": 512,
139+
}[args.version]
140+
if args.max_sequence_length is not None:
141+
if args.max_sequence_length > max_seq_supported_by_model:
142+
raise ValueError(
143+
f"For {args.version}, `max_sequence_length` cannot be greater than {max_seq_supported_by_model} but is {args.max_sequence_length}"
144+
)
145+
else:
146+
args.max_sequence_length = max_seq_supported_by_model
147+
148+
if args.torch_fallback and not args.torch_inference:
149+
args.torch_fallback = args.torch_fallback.split(",")
150+
151+
if args.torch_fallback and args.torch_inference:
152+
print(f"[W] All models will run in PyTorch when --torch-inference is set. Parameter --torch-fallback will be ignored.")
153+
args.torch_fallback = None
107154

108155
args_run_demo = (
109156
prompt,
@@ -131,6 +178,10 @@ def process_demo_args(args):
131178
max_sequence_length=args.max_sequence_length,
132179
bf16=args.bf16,
133180
low_vram=args.low_vram,
181+
torch_fallback=args.torch_fallback,
182+
weight_streaming=args.ws,
183+
t5_weight_streaming_budget_percentage=args.t5_ws_percentage,
184+
transformer_weight_streaming_budget_percentage=args.transformer_ws_percentage,
134185
**kwargs_init_pipeline)
135186

136187
# Load TensorRT engines and pytorch modules

0 commit comments

Comments
 (0)