You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+32
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,37 @@
1
1
# TensorRT OSS Release Changelog
2
2
3
+
## 10.6.0 GA - 2024-11-05
4
+
Key Feature and Updates:
5
+
- Demo Changes
6
+
- demoBERT: The use of `fcPlugin` in demoBERT has been removed.
7
+
- demoBERT: All TensorRT plugins now used in demoBERT (`CustomEmbLayerNormDynamic`, `CustomSkipLayerNormDynamic`, and `CustomQKVToContextDynamic`) now have versions that inherit from IPluginV3 interface classes. The user can opt-in to use these V3 plugins by specifying `--use-v3-plugins` to the builder scripts.
8
+
- Opting-in to use V3 plugins does not affect performance, I/O, or plugin attributes.
9
+
- There is a known issue in the V3 (version 4) of `CustomQKVToContextDynamic` plugin from TensorRT 10.6.0, causing an internal assertion error if either the batch or sequence dimensions differ at runtime from the ones used to serialize the engine. See the “known issues” section of the [TensorRT-10.6.0 release notes](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/index.html#rel-10-6-0).
10
+
- For smoother migration, the default behavior is still using the deprecated `IPluginV2DynamicExt`-derived plugins, when the flag: `--use-v3-plugins` isn't specified in the builder scripts. The flag `--use-deprecated-plugins` was added as an explicit way to enforce the default behavior, and is mutually exclusive with `--use-v3-plugins`.
11
+
- demoDiffusion
12
+
- Introduced BF16 and FP8 support for the [Flux.1-dev](demo/Diffusion#generate-an-image-guided-by-a-text-prompt-using-flux) pipeline.
13
+
- Expanded FP8 support on Ada platforms.
14
+
- Enabled LoRA adapter compatibility for SDv1.5, SDv2.1, and SDXL pipelines using Diffusers version 0.30.3.
15
+
16
+
- Sample Changes
17
+
- Added the Python sample [quickly_deployable_plugins](samples/python/quickly_deployable_plugins), which demonstrates quickly deployable Python-based plugin definitions (QDPs) in TensorRT. QDPs are a simple and intuitive decorator-based approach to defining TensorRT plugins, requiring drastically less code.
18
+
19
+
- Plugin Changes
20
+
- The `fcPlugin` has been deprecated. Its functionality has been superseded by the [IMatrixMultiplyLayer](https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_matrix_multiply_layer.html) that is natively provided by TensorRT.
21
+
- Migrated `IPluginV2`-descendent version 1 of `CustomEmbLayerNormDynamic`, to version 6, which implements `IPluginV3`.
22
+
- The newer versions preserve the attributes and I/O of the corresponding older plugin version.
23
+
- The older plugin versions are deprecated and will be removed in a future release.
24
+
25
+
- Parser Changes
26
+
- Updated ONNX submodule version to 1.17.0.
27
+
- Fixed issue where conditional layers were incorrectly being added.
28
+
- Updated local function metadata to contain more information.
29
+
- Added support for parsing nodes with Quickly Deployable Plugins.
Copy file name to clipboardExpand all lines: README.md
+9-9
Original file line number
Diff line number
Diff line change
@@ -26,7 +26,7 @@ You can skip the **Build** section to enjoy TensorRT with Python.
26
26
To build the TensorRT-OSS components, you will first need the following software packages.
27
27
28
28
**TensorRT GA build**
29
-
* TensorRT v10.5.0.18
29
+
* TensorRT v10.6.0.26
30
30
* Available from direct download links listed below
31
31
32
32
**System Packages**
@@ -73,25 +73,25 @@ To build the TensorRT-OSS components, you will first need the following software
73
73
If using the TensorRT OSS build container, TensorRT libraries are preinstalled under `/usr/lib/x86_64-linux-gnu` and you may skip this step.
74
74
75
75
Else download and extract the TensorRT GA build from [NVIDIA Developer Zone](https://developer.nvidia.com) with the direct links below:
76
-
-[TensorRT 10.5.0.18 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.5.0/tars/TensorRT-10.5.0.18.Linux.x86_64-gnu.cuda-11.8.tar.gz)
77
-
-[TensorRT 10.5.0.18 for CUDA 12.6, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.5.0/tars/TensorRT-10.5.0.18.Linux.x86_64-gnu.cuda-12.6.tar.gz)
78
-
-[TensorRT 10.5.0.18 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.5.0/zip/TensorRT-10.5.0.18.Windows.win10.cuda-11.8.zip)
79
-
-[TensorRT 10.5.0.18 for CUDA 12.6, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.5.0/zip/TensorRT-10.5.0.18.Windows.win10.cuda-12.6.zip)
76
+
-[TensorRT 10.6.0.26 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.6.0/tars/TensorRT-10.6.0.26.Linux.x86_64-gnu.cuda-11.8.tar.gz)
77
+
-[TensorRT 10.6.0.26 for CUDA 12.6, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.6.0/tars/TensorRT-10.6.0.26.Linux.x86_64-gnu.cuda-12.6.tar.gz)
78
+
-[TensorRT 10.6.0.26 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.6.0/zip/TensorRT-10.6.0.26.Windows.win10.cuda-11.8.zip)
79
+
-[TensorRT 10.6.0.26 for CUDA 12.6, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.6.0/zip/TensorRT-10.6.0.26.Windows.win10.cuda-12.6.zip)
80
80
81
81
82
82
**Example: Ubuntu 20.04 on x86-64 with cuda-12.6**
83
83
84
84
```bash
85
85
cd~/Downloads
86
-
tar -xvzf TensorRT-10.5.0.18.Linux.x86_64-gnu.cuda-12.6.tar.gz
87
-
export TRT_LIBPATH=`pwd`/TensorRT-10.5.0.18
86
+
tar -xvzf TensorRT-10.6.0.26.Linux.x86_64-gnu.cuda-12.6.tar.gz
Copy file name to clipboardExpand all lines: demo/BERT/README.md
+16-13
Original file line number
Diff line number
Diff line change
@@ -75,7 +75,7 @@ The following software version configuration has been tested:
75
75
|Software|Version|
76
76
|--------|-------|
77
77
|Python|>=3.8|
78
-
|TensorRT|10.5.0.18|
78
+
|TensorRT|10.6.0.26|
79
79
|CUDA|12.6|
80
80
81
81
## Setup
@@ -122,7 +122,7 @@ This demo BERT application can be run within the TensorRT OSS build container. I
122
122
bash scripts/download_model.sh
123
123
```
124
124
125
-
**Note:** Since the datasets and checkpoints are stored in the directory mounted from the host, they do*not* need to be downloaded each time the container is launched.
125
+
**Note:** Since the datasets and checkpoints are stored in the directory mounted from the host, they do*not* need to be downloaded each time the container is launched.
126
126
127
127
**Warning:** In the event of encountering an error message stating, "Missing API key and missing Email Authentication. This command requires an API key or authentication via browser login", the recommended steps for resolution are as follows:
128
128
* Generate an API key by logging in https://ngc.nvidia.com/setup/api-key and copy the generated API key.
@@ -153,11 +153,11 @@ Completing these steps should resolve the error you encountered and allow the co
153
153
jupyter notebook --ip 0.0.0.0 inference.ipynb
154
154
```
155
155
Then, use your browser to open the link displayed. The link should look similar to: `http://127.0.0.1:8888/?token=<TOKEN>`
156
-
156
+
157
157
6. Run inference with CUDA Graph support.
158
158
159
159
A separate python `inference_c.py` script is provided to run inference with CUDA Graph support. This is necessary since CUDA Graph is only supported through CUDA C/C++ APIs, not pyCUDA. The `inference_c.py` script uses pybind11 to interface with C/C++ for CUDA graph capturing and launching. The cmdline interface is the same as `inference.py` except for an extra `--enable-graph` option.
160
-
160
+
161
161
```bash
162
162
mkdir -p build;pushd build
163
163
cmake .. -DPYTHON_EXECUTABLE=$(which python)
@@ -167,11 +167,11 @@ Completing these steps should resolve the error you encountered and allow the co
167
167
```
168
168
169
169
A separate C/C++ inference benchmark executable `perf` (compiled from `perf.cpp`) is provided to run inference benchmarks with CUDA Graph. The cmdline interface is the same as `perf.py` except for an extra `--enable_graph` option.
@@ -220,6 +220,9 @@ The `infer_c/` folder contains all the necessary C/C++ files required for CUDA G
220
220
221
221
To view the available parameters for each script, you can use the help flag (`-h`).
222
222
223
+
**Note:** In the builder scripts (`builder.py` and `builder_varseqlen.py`), the options `--use-deprecated-plugins` and `--use-v3-plugins` toggle the underlying implementation of the plugins used in demoBERT. They are mutually exclusive, and enabling either should not affect functionality, or performance. The `--use-deprecated-plugins` uses plugin versions that inherit from `IPluginV2DynamicExt`, while`--use-v3-plugins` uses plugin versions that inherit from `IPluginV3` classes.
224
+
If unspecified, `--use-deprecated-plugins` is used by default.
225
+
223
226
### TensorRT inference process
224
227
225
228
As mentioned in the [Quick Start Guide](#quick-start-guide), two options are provided for running inference:
@@ -245,7 +248,7 @@ As mentioned in the [Quick Start Guide](#quick-start-guide), two options are pro
245
248
**Xavier GPU**
246
249
```bash
247
250
# Only supports SkipLayerNormPlugin running with INT8 I/O. Use -iln builder flag to enable.
This will build and engine with a maximum batch size of 1 (`-b 1`) and sequence length of 384 (`-s 384`) using INT8 mixed precision computation where possible (`--int8 --fp16 --strict`).
@@ -324,10 +327,10 @@ Note this is an experimental feature because we only support Xavier+ GPUs, also
324
327
325
328
This will build and engine with a maximum batch size of 1 (`-b 1`) and sequence length of 256 (`-s 256`) using INT8 precision computation where possible (`--int8`).
326
329
327
-
3. Run inference
330
+
3. Run inference
328
331
329
332
Evaluate the F1 score and exact match score using the squad dataset:
This will collect performance data run use batch size 1 (`-b 1`) and sequence length of 256 (`-s 256`).
351
+
This will collect performance data run use batch size 1 (`-b 1`) and sequence length of 256 (`-s 256`).
349
352
350
353
5. Collect performance data with CUDA graph enabled
351
354
352
-
We can use the same `inference_c.py` and `build/perf` to collect performance data with cuda graph enabled. The command line is the same as run without variable sequence length.
355
+
We can use the same `inference_c.py` and `build/perf` to collect performance data with cuda graph enabled. The command line is the same as run without variable sequence length.
0 commit comments