Releases: NVIDIA/TensorRT
Releases · NVIDIA/TensorRT
TensorRT OSS v8.2.0 EA
TensorRT OSS release corresponding to TensorRT 8.2.0.6 EA release.
Added
- Demo applications showcasing TensorRT inference of HuggingFace Transformers.
- Support is currently extended to GPT-2 and T5 models.
- Added support for the following ONNX operators:
EinsumIsNanGatherNDScatterScatterElementsScatterNDSignRound
- Added support for building TensorRT Python API on Windows.
Updated
- Notable API updates in TensorRT 8.2.0.6 EA release. See TensorRT Developer Guide for details.
- Added three new APIs,
IExecutionContext: getEnqueueEmitsProfile(),setEnqueueEmitsProfile(), andreportToProfiler()which can be used to collect layer profiling info when the inference is launched as a CUDA graph. - Eliminated the global logger; each
Runtime,BuilderorRefitternow has its own logger. - Added new operators:
IAssertionLayer,IConditionLayer,IEinsumLayer,IIfConditionalBoundaryLayer,IIfConditionalOutputLayer,IIfConditionalInputLayer, andIScatterLayer. - Added new
IGatherLayermodes:kELEMENTandkND - Added new
ISliceLayermodes:kFILL,kCLAMP, andkREFLECT - Added new
IUnaryLayeroperators:kSIGNandkROUND - Added new runtime class
IEngineInspectorthat can be used to inspect the detailed information of an engine, including the layer parameters, the chosen tactics, the precision used, etc. ProfilingVerbosityenums have been updated to show their functionality more explicitly.
- Added three new APIs,
- Updated TensorRT OSS container defaults to cuda 11.4
- CMake to target C++14 builds.
- Updated following ONNX operators:
GatherandGatherElementsimplementations to natively support negative indicesPadlayer to support ND padding, along withedgeandreflectpadding mode supportIflayer with general performance improvements.
Removed
- Removed
sampleMLP. - Several flags of trtexec have been deprecated:
--explicitBatchflag has been deprecated and has no effect. When the input model is in UFF or in Caffe prototxt format, the implicit batch dimension mode is used automatically; when the input model is in ONNX format, the explicit batch mode is used automatically.--explicitPrecisionflag has been deprecated and has no effect. When the input ONNX model contains Quantization/Dequantization nodes, TensorRT automatically uses explicit precision mode.--nvtxMode=[verbose|default|none]has been deprecated in favor of--profilingVerbosity=[detailed|layer_names_only|none]to show its functionality more explicitly.
Signed-off-by: Rajeev Rao [email protected]
21.09
21.08
Commit used by the 21.08 TensorRT NGC container.
Changelog
Added
- Add demoBERT and demoBERT-MT (sparsity) benchmark data for TensorRT 8.
- Added example python notebooks
Changed
- Updated samples and plugins directory structure
- Updates to TensorRT developer tools
- README fix to update build command for native aarch64 builds.
Removed
- N/A
21.07
TensorRT OSS v8.0.1
TensorRT OSS release corresponding to TensorRT 8.0.1.6 GA release.
Added
- Added support for the following ONNX operators:
Celu,CumSum,EyeLike,GatherElements,GlobalLpPool,GreaterOrEqual,LessOrEqual,LpNormalization,LpPool,ReverseSequence, andSoftmaxCrossEntropyLoss. - Rehauled
ResizeONNX operator, now fully supporting the following modes:- Coordinate Transformation modes:
half_pixel,pytorch_half_pixel,tf_half_pixel_for_nn,asymmetric, andalign_corners. - Modes:
nearest,linear. - Nearest Modes:
floor,ceil,round_prefer_floor,round_prefer_ceil.
- Coordinate Transformation modes:
- Added support for multi-input ONNX
ConvTranposeoperator. - Added support for 3D spatial dimensions in ONNX
InstanceNormalization. - Added support for generic 2D padding in ONNX.
- ONNX
QuantizeLinearandDequantizeLinearoperators leverageIQuantizeLayerandIDequantizeLayer.- Added support for tensor scales.
- Added support for per-axis quantization.
- Added
EfficientNMS_TRT,EfficientNMS_ONNX_TRTplugins and experimental support for ONNXNonMaxSuppressionoperator. - Added
ScatterNDplugin. - Added TensorRT QuickStart Guide.
- Added new samples: engine_refit_onnx_bidaf builds an engine from ONNX BiDAF model and refits engine with new weights, efficientdet and efficientnet samples for demonstrating Object Detection using TensorRT.
- Added support for Ubuntu20.04 and RedHat/CentOS 8.3.
- Added Python 3.9 support.
Changed
- Update Polygraphy to v0.30.3.
- Update ONNX-GraphSurgeon to v0.3.10.
- Update Pytorch Quantization toolkit to v2.1.0.
- Notable TensorRT API updates
- TensorRT now declares API’s with the
noexceptkeyword. All TensorRT classes that an application inherits from (such as IPluginV2) must guarantee that methods called by TensorRT do not throw uncaught exceptions, or the behavior is undefined. - Destructors for classes with
destroy()methods were previously protected. They are now public, enabling use of smart pointers for these classes. Thedestroy()methods are deprecated.
- TensorRT now declares API’s with the
- Moved
RefitMapAPI from ONNX parser to core TensorRT. - Various bugfixes for plugins, samples and ONNX parser.
- Port demoBERT to tensorflow2 and update UFF samples to leverage nvidia-tensorflow1 container.
Removed
IPluginandIPluginFactoryinterfaces were deprecated in TensorRT 6.0 and have been removed in TensorRT 8.0. We recommend that you write new plugins or refactor existing ones to target theIPluginV2DynamicExtandIPluginV2IOExtinterfaces. For more information, refer to Migrating Plugins From TensorRT 6.x Or 7.x To TensorRT 8.x.x.- For plugins based on
IPluginV2DynamicExtandIPluginV2IOExt, certain methods with legacy function signatures (derived fromIPluginV2andIPluginV2Extbase classes) which were deprecated and marked for removal in TensorRT 8.0 will no longer be available.
- For plugins based on
- Removed
samplePluginsince it showcased IPluginExt interface, which is no longer supported in TensorRT 8.0. - Removed
sampleMovieLensandsampleMovieLensMPS. - Removed Dockerfile for Ubuntu 16.04. TensorRT 8.0 debians for Ubuntu 16.04 require python 3.5 while minimum required python version for TensorRT OSS is 3.6.
- Removed support for PowerPC builds, consistent with TensorRT GA releases.
Notes
- We had deprecated the Caffe Parser and UFF Parser in TensorRT 7.0. They are still tested and functional in TensorRT 8.0, however, we plan to remove the support in a future release. Ensure you migrate your workflow to use
tf2onnx,keras2onnxor TensorFlow-TensorRT (TF-TRT).
Signed-off-by: Rajeev Rao [email protected]
21.06
Commit used by the 21.06 TensorRT NGC container
Changelog
Added
- Add switch for batch-agnostic mode in NMS plugin
- Add missing model.py in
uff_custom_pluginsample
Changed
- Update to Polygraphy v0.29.2
- Update to ONNX-GraphSurgeon v0.3.9
- Fix numerical errors for float type in NMS/batchedNMS plugins
- Update demoBERT input dimensions to match Triton requirement #1051
- Optimize TLT MaskRCNN plugins:
- enable fp16 precision in multilevelCropAndResizePlugin and multilevelProposeROIPlugin
- Algorithms optimization for NMS kernels and ROIAlign kernel
- Fix invalid cuda config issue when bs is larger than 32
- Fix issues found on Jetson NANO
Removed
- Removed fcplugin from demoBERT to improve inference latency on GA100/Turing
21.05
Commit used by the 21.05 TensorRT NGC container
Changelog
Added
- Extended support for ONNX operator
InstanceNormalizationto 5D tensors - Support negative indices in ONNX
Gatheroperator - Add support for importing ONNX double-typed weights as float
- ONNX-GraphSurgeon (v0.3.7) support for models with externally stored weights
Changed
- Update ONNX-TensorRT to 21.05
- Relicense ONNX-TensorRT under Apache2
- demoBERT builder fixes for multi-batch
- Speedup demoBERT build using global timing cache and disable cuDNN tactics
- Standardize python package versions across OSS samples
- Bugfixes in multilevelProposeROI and bertQKV plugin
- Fix memleaks in samples logger
21.04
Commit used by the 21.04 TensorRT NGC container
Changelog
Added
- SM86 kernels for BERT MHA plugin
- Added opset13 support for
SoftMax,LogSoftmax,Squeeze, andUnsqueeze. - Added support for the
EyeLikeandGatherElementsoperators.
Changed
- Updated TensorRT version to v7.2.3.4.
- Update to ONNX-TensorRT 21.03
- ONNX-GraphSurgeon (v0.3.4) - updates fold_constants to correctly exit early.
- Set default CUDA_INSTALL_DIR #798
- Plugin bugfixes, qkv kernels for sm86
- Fixed GroupNorm CMakeFile for cu sources #1083
- Permit groupadd with non-unique GID in build containers #1091
- Avoid
reinterpret_cast#146 - Clang-format plugins and samples
- Avoid arithmetic on void pointer in multilevelProposeROIPlugin.cpp #1028
- Update BERT plugin documentation.
Removed
- Removes extra terminate call in InstanceNorm
21.03
Commit used by the 21.03 TensorRT NGC container
Changelog
Added
- Optimized FP16 NMS/batchedNMS plugins with n-bit radix sort and based on
IPluginV2DynamicExt ProposalDynamicandCropAndResizeDynamicplugins based onIPluginV2DynamicExt
Changed
- ONNX-TensorRT v21.03 update
- ONNX-GraphSurgeon v0.3.3 update
- Bugfix for
scaledSoftmaxkernel #1096
Removed
- N/A
21.02
Commit used by the 21.02 TensorRT NGC container
Changelog
Added
- TensorRT Python API bindings
- TensorRT Python samples
- FP16 support to batchedNMSPlugin #1002
- Configurable input size for TLT MaskRCNN Plugin #986
Changed
- TensorRT version updated to 7.2.2.3
- ONNX-TensorRT v21.02 update
- Polygraphy v0.21.1 update
- PyTorch-Quantization Toolkit v2.1.0 update
- Documentation update, ONNX opset 13 support, ResNet example
- ONNX-GraphSurgeon v0.28 update
- demoBERT builder updated to work with Tensorflow2 (in compatibility mode)
- Refactor Dockerfiles for OSS container
Removed
- N/A