Skip to content

Commit 866548c

Browse files
authored
Merge pull request #4119 from NVIDIA/dev-poweiw-10.4-stage
Update for 10.4-GA
2 parents c5b9de3 + c664ce2 commit 866548c

File tree

100 files changed

+9008
-2864
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

100 files changed

+9008
-2864
lines changed

CHANGELOG.md

+33-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,38 @@
11
# TensorRT OSS Release Changelog
22

3-
## 10.3.0 GA - 2024-08-07
3+
## 10.4.0 GA - 2024-09-11
4+
Key Features and Updates:
5+
6+
- Demo changes
7+
- Added [Stable Cascade](demo/Diffusion) pipeline.
8+
- Enabled INT8 and FP8 quantization for Stable Diffusion v1.5, v2.0 and v2.1 pipelines.
9+
- Enabled FP8 quantization for Stable Diffusion XL pipeline.
10+
- Sample changes
11+
- Add a new python sample `aliased_io_plugin` which demonstrates how in-place updates to plugin inputs can be achieved through I/O aliasing.
12+
- Plugin changes
13+
- Migrated IPluginV2-descendent versions (a) of the following plugins to newer versions (b) which implement IPluginV3 (a->b):
14+
- scatterElementsPlugin (1->2)
15+
- skipLayerNormPlugin (1->5, 2->6, 3->7, 4->8)
16+
- embLayerNormPlugin (2->4, 3->5)
17+
- bertQKVToContextPlugin (1->4, 2->5, 3->6)
18+
- Note
19+
- The newer versions preserve the corresponding attributes and I/O of the corresponding older plugin version.
20+
- The older plugin versions are deprecated and will be removed in a future release.
21+
22+
- Quickstart guide
23+
- Updated deploy_to_triton guide and removed legacy APIs.
24+
- Removed legacy TF-TRT code as the project is no longer supported.
25+
- Removed quantization_tutorial as pytorch_quantization has been deprecated. Check out https://github.com/NVIDIA/TensorRT-Model-Optimizer for the latest quantization support. Check [Stable Diffusion XL (Base/Turbo) and Stable Diffusion 1.5 Quantization with Model Optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/diffusers/quantization) for integration with TensorRT.
26+
- Parser changes
27+
- Added support for tensor `axes` for `Pad` operations.
28+
- Added support for `BlackmanWindow`, `HammingWindow`, and `HannWindow` operations.
29+
- Improved error handling in `IParserRefitter`.
30+
- Fixed kernel shape inference in multi-input convolutions.
31+
32+
- Updated tooling
33+
- polygraphy-extension-trtexec v0.0.9
34+
35+
## 10.3.0 GA - 2024-08-02
436

537
Key Features and Updates:
638

CMakeLists.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ option(BUILD_PARSERS "Build TensorRT parsers" ON)
8080
option(BUILD_SAMPLES "Build TensorRT samples" ON)
8181

8282
# C++14
83-
set(CMAKE_CXX_STANDARD 14)
83+
set(CMAKE_CXX_STANDARD 17)
8484
set(CMAKE_CXX_STANDARD_REQUIRED ON)
8585
set(CMAKE_CXX_EXTENSIONS OFF)
8686

README.md

+28-28
Original file line numberDiff line numberDiff line change
@@ -26,13 +26,13 @@ You can skip the **Build** section to enjoy TensorRT with Python.
2626
To build the TensorRT-OSS components, you will first need the following software packages.
2727

2828
**TensorRT GA build**
29-
* TensorRT v10.3.0.26
29+
* TensorRT v10.4.0.26
3030
* Available from direct download links listed below
3131

3232
**System Packages**
3333
* [CUDA](https://developer.nvidia.com/cuda-toolkit)
3434
* Recommended versions:
35-
* cuda-12.5.0 + cuDNN-8.9
35+
* cuda-12.6.0 + cuDNN-8.9
3636
* cuda-11.8.0 + cuDNN-8.9
3737
* [GNU make](https://ftp.gnu.org/gnu/make/) >= v4.1
3838
* [cmake](https://github.com/Kitware/CMake/releases) >= v3.13
@@ -73,25 +73,25 @@ To build the TensorRT-OSS components, you will first need the following software
7373
If using the TensorRT OSS build container, TensorRT libraries are preinstalled under `/usr/lib/x86_64-linux-gnu` and you may skip this step.
7474

7575
Else download and extract the TensorRT GA build from [NVIDIA Developer Zone](https://developer.nvidia.com) with the direct links below:
76-
- [TensorRT 10.3.0.26 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.3.0/tars/TensorRT-10.3.0.26.Linux.x86_64-gnu.cuda-11.8.tar.gz)
77-
- [TensorRT 10.3.0.26 for CUDA 12.5, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.3.0/tars/TensorRT-10.3.0.26.Linux.x86_64-gnu.cuda-12.5.tar.gz)
78-
- [TensorRT 10.3.0.26 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.3.0/zip/TensorRT-10.3.0.26.Windows.win10.cuda-11.8.zip)
79-
- [TensorRT 10.3.0.26 for CUDA 12.5, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.3.0/zip/TensorRT-10.3.0.26.Windows.win10.cuda-12.5.zip)
76+
- [TensorRT 10.4.0.26 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.4.0/tars/TensorRT-10.4.0.26.Linux.x86_64-gnu.cuda-11.8.tar.gz)
77+
- [TensorRT 10.4.0.26 for CUDA 12.6, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.4.0/tars/TensorRT-10.4.0.26.Linux.x86_64-gnu.cuda-12.6.tar.gz)
78+
- [TensorRT 10.4.0.26 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.4.0/zip/TensorRT-10.4.0.26.Windows.win10.cuda-11.8.zip)
79+
- [TensorRT 10.4.0.26 for CUDA 12.6, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.4.0/zip/TensorRT-10.4.0.26.Windows.win10.cuda-12.6.zip)
8080

8181

82-
**Example: Ubuntu 20.04 on x86-64 with cuda-12.5**
82+
**Example: Ubuntu 20.04 on x86-64 with cuda-12.6**
8383

8484
```bash
8585
cd ~/Downloads
86-
tar -xvzf TensorRT-10.3.0.26.Linux.x86_64-gnu.cuda-12.5.tar.gz
87-
export TRT_LIBPATH=`pwd`/TensorRT-10.3.0.26
86+
tar -xvzf TensorRT-10.4.0.26.Linux.x86_64-gnu.cuda-12.6.tar.gz
87+
export TRT_LIBPATH=`pwd`/TensorRT-10.4.0.26
8888
```
8989

90-
**Example: Windows on x86-64 with cuda-12.5**
90+
**Example: Windows on x86-64 with cuda-12.6**
9191

9292
```powershell
93-
Expand-Archive -Path TensorRT-10.3.0.26.Windows.win10.cuda-12.5.zip
94-
$env:TRT_LIBPATH="$pwd\TensorRT-10.3.0.26\lib"
93+
Expand-Archive -Path TensorRT-10.4.0.26.Windows.win10.cuda-12.6.zip
94+
$env:TRT_LIBPATH="$pwd\TensorRT-10.4.0.26\lib"
9595
```
9696

9797
## Setting Up The Build Environment
@@ -101,27 +101,27 @@ For Linux platforms, we recommend that you generate a docker container for build
101101
1. #### Generate the TensorRT-OSS build container.
102102
The TensorRT-OSS build container can be generated using the supplied Dockerfiles and build scripts. The build containers are configured for building TensorRT OSS out-of-the-box.
103103

104-
**Example: Ubuntu 20.04 on x86-64 with cuda-12.5 (default)**
104+
**Example: Ubuntu 20.04 on x86-64 with cuda-12.6 (default)**
105105
```bash
106-
./docker/build.sh --file docker/ubuntu-20.04.Dockerfile --tag tensorrt-ubuntu20.04-cuda12.5
106+
./docker/build.sh --file docker/ubuntu-20.04.Dockerfile --tag tensorrt-ubuntu20.04-cuda12.6
107107
```
108-
**Example: Rockylinux8 on x86-64 with cuda-12.5**
108+
**Example: Rockylinux8 on x86-64 with cuda-12.6**
109109
```bash
110-
./docker/build.sh --file docker/rockylinux8.Dockerfile --tag tensorrt-rockylinux8-cuda12.5
110+
./docker/build.sh --file docker/rockylinux8.Dockerfile --tag tensorrt-rockylinux8-cuda12.6
111111
```
112-
**Example: Ubuntu 22.04 cross-compile for Jetson (aarch64) with cuda-12.5 (JetPack SDK)**
112+
**Example: Ubuntu 22.04 cross-compile for Jetson (aarch64) with cuda-12.6 (JetPack SDK)**
113113
```bash
114-
./docker/build.sh --file docker/ubuntu-cross-aarch64.Dockerfile --tag tensorrt-jetpack-cuda12.5
114+
./docker/build.sh --file docker/ubuntu-cross-aarch64.Dockerfile --tag tensorrt-jetpack-cuda12.6
115115
```
116-
**Example: Ubuntu 22.04 on aarch64 with cuda-12.5**
116+
**Example: Ubuntu 22.04 on aarch64 with cuda-12.6**
117117
```bash
118-
./docker/build.sh --file docker/ubuntu-22.04-aarch64.Dockerfile --tag tensorrt-aarch64-ubuntu22.04-cuda12.5
118+
./docker/build.sh --file docker/ubuntu-22.04-aarch64.Dockerfile --tag tensorrt-aarch64-ubuntu22.04-cuda12.6
119119
```
120120

121121
2. #### Launch the TensorRT-OSS build container.
122122
**Example: Ubuntu 20.04 build container**
123123
```bash
124-
./docker/launch.sh --tag tensorrt-ubuntu20.04-cuda12.5 --gpus all
124+
./docker/launch.sh --tag tensorrt-ubuntu20.04-cuda12.6 --gpus all
125125
```
126126
> NOTE:
127127
<br> 1. Use the `--tag` corresponding to build container generated in Step 1.
@@ -132,38 +132,38 @@ For Linux platforms, we recommend that you generate a docker container for build
132132
## Building TensorRT-OSS
133133
* Generate Makefiles and build.
134134

135-
**Example: Linux (x86-64) build with default cuda-12.5**
135+
**Example: Linux (x86-64) build with default cuda-12.6**
136136
```bash
137137
cd $TRT_OSSPATH
138138
mkdir -p build && cd build
139139
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out
140140
make -j$(nproc)
141141
```
142-
**Example: Linux (aarch64) build with default cuda-12.5**
142+
**Example: Linux (aarch64) build with default cuda-12.6**
143143
```bash
144144
cd $TRT_OSSPATH
145145
mkdir -p build && cd build
146146
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out -DCMAKE_TOOLCHAIN_FILE=$TRT_OSSPATH/cmake/toolchains/cmake_aarch64-native.toolchain
147147
make -j$(nproc)
148148
```
149-
**Example: Native build on Jetson (aarch64) with cuda-12.5**
149+
**Example: Native build on Jetson (aarch64) with cuda-12.6**
150150
```bash
151151
cd $TRT_OSSPATH
152152
mkdir -p build && cd build
153-
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out -DTRT_PLATFORM_ID=aarch64 -DCUDA_VERSION=12.5
153+
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out -DTRT_PLATFORM_ID=aarch64 -DCUDA_VERSION=12.6
154154
CC=/usr/bin/gcc make -j$(nproc)
155155
```
156156
> NOTE: C compiler must be explicitly specified via CC= for native aarch64 builds of protobuf.
157157

158-
**Example: Ubuntu 22.04 Cross-Compile for Jetson (aarch64) with cuda-12.5 (JetPack)**
158+
**Example: Ubuntu 22.04 Cross-Compile for Jetson (aarch64) with cuda-12.6 (JetPack)**
159159
```bash
160160
cd $TRT_OSSPATH
161161
mkdir -p build && cd build
162-
cmake .. -DCMAKE_TOOLCHAIN_FILE=$TRT_OSSPATH/cmake/toolchains/cmake_aarch64.toolchain -DCUDA_VERSION=12.5 -DCUDNN_LIB=/pdk_files/cudnn/usr/lib/aarch64-linux-gnu/libcudnn.so -DCUBLAS_LIB=/usr/local/cuda-12.5/targets/aarch64-linux/lib/stubs/libcublas.so -DCUBLASLT_LIB=/usr/local/cuda-12.5/targets/aarch64-linux/lib/stubs/libcublasLt.so -DTRT_LIB_DIR=/pdk_files/tensorrt/lib
162+
cmake .. -DCMAKE_TOOLCHAIN_FILE=$TRT_OSSPATH/cmake/toolchains/cmake_aarch64.toolchain -DCUDA_VERSION=12.6 -DCUDNN_LIB=/pdk_files/cudnn/usr/lib/aarch64-linux-gnu/libcudnn.so -DCUBLAS_LIB=/usr/local/cuda-12.6/targets/aarch64-linux/lib/stubs/libcublas.so -DCUBLASLT_LIB=/usr/local/cuda-12.6/targets/aarch64-linux/lib/stubs/libcublasLt.so -DTRT_LIB_DIR=/pdk_files/tensorrt/lib
163163
make -j$(nproc)
164164
```
165165

166-
**Example: Native builds on Windows (x86) with cuda-12.5**
166+
**Example: Native builds on Windows (x86) with cuda-12.6**
167167
```powershell
168168
cd $TRT_OSSPATH
169169
mkdir -p build

VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
10.3.0.26
1+
10.4.0.26

0 commit comments

Comments
 (0)