8.5.3 GA - 2023-01-30
TensorRT OSS release corresponding to TensorRT 8.5.3.1 GA release.
- Updates since TensorRT 8.5.2 GA release.
- Please refer to the TensorRT 8.5.3 GA release notes for more information.
Key Features and Updates:
- Added the following HuggingFace demos: GPT-J-6B, GPT2-XL, and GPT2-Medium
- Added nvinfer1::plugin namespace
- Optimized KV Cache performance for T5
8.5.2 GA - 2022-12-12
TensorRT OSS release corresponding to TensorRT 8.5.2.2 GA release.
- Updates since TensorRT 8.5.1 GA release.
- Please refer to the TensorRT 8.5.2 GA release notes for more information.
Key Features and Updates:
- Plugin enhancements
- Added LayerNormPlugin, SplitGeLUPlugin, GroupNormPlugin, and SeqLen2SpatialPlugin to support stable diffusion demo.
- KV-cache and beam search to GPT2 and T5 demos
22.12 - 2022-12-06
- Stable Diffusion demo using TensorRT Plugins
- KV-cache and beam search to GPT2 and T5 demos
- Perplexity calculation to all HF demos
- Updated trex to v0.1.5
- Increased default workspace size in demoBERT to build BS=128 fp32 engines
- Use
avg_iter=8and timing cache to make demoBERT perf more stable
- None
8.5.1 GA - 2022-11-01
TensorRT OSS release corresponding to TensorRT 8.5.1.7 GA release.
- Updates since TensorRT 8.4.1 GA release.
- Please refer to the TensorRT 8.5.1 GA release notes for more information.
Key Features and Updates:
-
Samples enhancements
- Added sampleNamedDimensions which works with named dimensions.
- Updated
sampleINT8APIandintroductory_parser_samplesto useONNXmodels overCaffe/UFF - Removed UFF/Caffe samples including
sampleMNIST,end_to_end_tensorflow_mnist,sampleINT8,sampleMNISTAPI,sampleUffMNIST,sampleUffPluginV2Ext,engine_refit_mnist,int8_caffe_mnist,uff_custom_plugin,sampleFasterRCNN,sampleUffFasterRCNN,sampleGoogleNet,sampleSSD,sampleUffSSD,sampleUffMaskRCNNanduff_ssd.
-
Plugin enhancements
- Added GridAnchorRectPlugin to support rectangular feature maps in gridAnchorPlugin.
- Added ROIAlignPlugin to support the ONNX operator RoiAlign. The ONNX parser will automatically route ROIAlign ops through the plugin.
- Added Hopper support for the BERTQKVToContextPlugin plugin.
- Exposed the use_int8_scale_max attribute in the BERTQKVToContextPlugin plugin to allow users to disable the by-default usage of INT8 scale factors to optimize softmax MAX reduction in versions 2 and 3 of the plugin.
-
ONNX-TensorRT changes
- Added support for operator Reciprocal.
-
Build containers
- Updated default cuda versions to
11.8.0.
- Updated default cuda versions to
-
Tooling enhancements
- Updated onnx-graphsurgeon to v0.3.25.
- Updated Polygraphy to v0.43.1.
- Updated polygraphy-extension-trtexec to v0.0.8.
- Updated Tensorflow Quantization Toolkit to v0.2.0.
22.08 - 2022-08-16
Updated TensorRT version to 8.4.2 - see the TensorRT 8.4.2 release notes for more information
- Updated default protobuf version to 3.20.x
- Updated ONNX-TensorRT submodule version to
22.08tag - Updated
sampleIOFormatsandsampleAlgorithmSelectorto useONNXmodels overCaffe
- Fixed missing serialization member in custom
ClipPluginplugin used inuff_custom_pluginsample - Fixed various Python import issues
- Added new DeBERTA demo
- Added version 2 for
disentangledAttentionPluginto support DeBERTA v2
- None
22.07 - 2022-07-21
polygraphy-trtexec-plugintool for Polygraphy- Multi-profile support for demoBERT
- KV cache support for HF BART demo
- Updated ONNX-GS to
v0.3.20
- None
8.4.1 GA - 2022-06-14
TensorRT OSS release corresponding to TensorRT 8.4.1.5 GA release.
- Updates since TensorRT 8.2.1 GA release.
- Please refer to the TensorRT 8.4.1 GA release notes for more information.
Key Features and Updates:
-
Samples enhancements
- Added Detectron2 Mask R-CNN R50-FPN python sample
- Added a quickstart guide for NVidia Triton deployment workflow.
- Added onnx export script for sampleOnnxMnistCoordConvAC
- Removed
sampleNMT. - Removed usage of deprecated TensorRT APIs in samples.
-
EfficientDet sample
- Added support for EfficientDet Lite and AdvProp models.
- Added dynamic batch support.
- Added mixed precision engine builder.
-
HuggingFace transformer demo
- Added BART model.
- Performance speedup of GPT-2 greedy search using GPU implementation.
- Fixed GPT2 onnx export failure due to 2G file size limitation.
- Extended Megatron LayerNorm plugins to support larger hidden sizes.
- Added performance benchmarking mode.
- Enable tf32 format by default.
-
demoBERTenhancements- Add
--durationflag to perf benchmarking script. - Fixed import of
nvinfer_pluginslibrary in demoBERT on Windows.
- Add
-
Torch-QAT toolkit
quant_bert.pymodule removed. It is now upstreamed to HuggingFace QDQBERT.- Use axis0 as default for deconv.
- #1939 - Fixed path in
classification_flowexample.
-
Plugin enhancements
- Added Disentangled attention plugin,
DisentangledAttention_TRT, to support DeBERTa model. - Added Multiscale deformable attention plugin,
MultiscaleDeformableAttnPlugin_TRT, to support DDETR model. - Added new plugins: decodeBbox3DPlugin, pillarScatterPlugin, and voxelGeneratorPlugin.
- Refactored EfficientNMS plugin to support TF-TRT and implicit batch mode.
fp16support forpillarScatterPlugin.
- Added Disentangled attention plugin,
-
Build containers
- Updated default cuda versions to
11.6.2. - CentOS Linux 8 has reached End-of-Life on Dec 31, 2021. The corresponding container has been removed from TensorRT-OSS.
- Install
devtoolset-8for updated g++ versions in CentOS7 container.
- Updated default cuda versions to
-
Tooling enhancements
- Added Tensorflow Quantization Toolkit v0.1.0 for Quantization-Aware-Training of Tensorflow 2.x Keras models.
- Added TensorRT Engine Explorer v0.1.2 for inspecting TensorRT engine plans and associated inference profiling data.
- Updated Polygraphy to v0.38.0.
- Updated onnx-graphsurgeon to v0.3.19.
-
trtexecenhancements- Added
--layerPrecisionsand--layerOutputTypesflags for specifying layer-wise precision and output type constraints. - Added
--memPoolSizeflag to specify the size of workspace as well as the DLA memory pools via a unified interface. Correspondingly the--workspaceflag has been deprecated. - "End-To-End Host Latency" metric has been removed. Use the “Host Latency” metric instead. For more information, refer to Benchmarking Network section in the TensorRT Developer Guide.
- Use
enqueueV2()instead ofenqueue()when engine has explicit batch dimensions.
- Added
22.06 - 2022-06-08
- None
- Disentangled attention (DMHA) plugin refactored
- ONNX parser updated to 8.2GA
- None
22.05 - 2022-05-13
- Disentangled attention plugin for DeBERTa
- DMHA (multiscaleDeformableAttnPlugin) plugin for DDETR
- Performance benchmarking mode to HuggingFace demo
- Updated base TensorRT version to 8.2.5.1
- Updated onnx-graphsurgeon v0.3.19 CHANGELOG
- fp16 support for pillarScatterPlugin
- #1939 - Fixed path in quantization
classification_flow - Fixed GPT2 onnx export failure due to 2G limitation
- Use axis0 as default for deconv in pytorch-quantization toolkit
- Updated onnx export script for CoordConvAC sample
- Install devtoolset-8 for updated g++ version in CentOS7 container
- Usage of deprecated TensorRT APIs in samples removed
quant_bert.pymodule removed from pytorch-quantization
22.04 - 2022-04-13
- TensorRT Engine Explorer v0.1.0 README
- Detectron 2 Mask R-CNN R50-FPN python sample
- Model export script for sampleOnnxMnistCoordConvAC
- Updated base TensorRT version to 8.2.4.2
- Updated copyright headers with SPDX identifiers
- Updated onnx-graphsurgeon v0.3.17 CHANGELOG
PyramidROIAlignplugin refactor and bug fixes- Fixed
MultilevelCropAndResizecrashes on Windows - #1583 - sublicense ieee/half.h under Apache2
- Updated demo/BERT performance tables for rel-8.2
- #1774 Fix python hangs at IndexErrors when TF is imported after TensorRT
- Various bugfixes in demos - BERT, Tacotron2 and HuggingFace GPT/T5 notebooks
- Cleaned up sample READMEs
- sampleNMT removed from samples
22.03 - 2022-03-23
- EfficientDet sample enhancements
- Added support for EfficientDet Lite and AdvProp models.
- Added dynamic batch support.
- Added mixed precision engine builder.
- Better decoupling of HuggingFace demo tests
22.02 - 2022-02-04
- New plugins: decodeBbox3DPlugin, pillarScatterPlugin, and voxelGeneratorPlugin
- Extend Megatron LayerNorm plugins to support larger hidden sizes
- Refactored EfficientNMS plugin for TFTRT and added implicit batch mode support
- Update base TensorRT version to 8.2.3.0
- GPT-2 greedy search speedup - now runs on GPU
- Updates to TensorRT developer tools
- Updated ONNX parser to v8.2.3.0
- Minor updates and bugfixes
- Samples: TFOD, GPT-2, demo/BERT
- Plugins: proposalPlugin, geluPlugin, bertQKVToContextPlugin, batchedNMS
- Unused source file(s) in demo/BERT
8.2.1 GA - 2021-11-24
TensorRT OSS release corresponding to TensorRT 8.2.1.8 GA release.
-
Updates since TensorRT 8.2.0 EA release.
-
Please refer to the TensorRT 8.2.1 GA release notes for more information.
-
ONNX parser v8.2.1
- Removed duplicate constant layer checks that caused some performance regressions
- Fixed expand dynamic shape calculations
- Added parser-side checks for
Scatterlayer support
-
Sample updates
- Added Tensorflow Object Detection API converter samples, including Single Shot Detector, Faster R-CNN and Mask R-CNN models
- Multiple enhancements in HuggingFace transformer demos
- Added multi-batch support
- Fixed resultant performance regression in batchsize=1
- Fixed T5 large/T5-3B accuracy issues
- Added notebooks for T5 and GPT-2
- Added CPU benchmarking option
- Deprecated
kSTRICT_TYPES(strict type constraints). Equivalent behaviour now achieved by settingPREFER_PRECISION_CONSTRAINTS,DIRECT_IO, andREJECT_EMPTY_ALGORITHMS - Removed
sampleMovieLens - Renamed sampleReformatFreeIO to sampleIOFormats
- Add
idleTimeoption for samples to control qps - Specify default value for
precisionConstraints - Fixed reporting of TensorRT build version in trtexec
- Fixed
combineDescriptionstypo in trtexec/tracer.py - Fixed usages of of
kDIRECT_IO
-
Plugin updates
EfficientNMSplugin support extended to TF-TRT, and for clang builds.- Sanitize header definitions for BERT fused MHA plugin
- Separate C++ and cu files in
splitPluginto avoid PTX generation (required for CUDA enhanced compatibility support) - Enable C++14 build for plugins
-
ONNX tooling updates
- onnx-graphsurgeon upgraded to v0.3.14
- Polygraphy upgraded to v0.33.2
- pytorch-quantization toolkit upgraded to v2.1.2
-
Build and container fixes
- Add
SM86target to defaultGPU_ARCHSfor platforms with cuda-11.1+ - Remove deprecated
SM_35and addSM_60to defaultGPU_ARCHS - Skip CUB builds for cuda 11.0+ #1455
- Fixed cuda-10.2 container build failures in Ubuntu 20.04
- Add native ARM server build container
- Install devtoolset-8 for updated g++ version in CentOS7
- Added a note on supporting c++14 builds for CentOS7
- Fixed docker build for large UIDs #1373
- Updated README instructions for Jetpack builds
- Add
-
demo enhancements
- Updated Tacotron2 instructions and add CPU benchmarking
- Fixed issues in demoBERT python notebook
-
Documentation updates
- Updated Python documentation for
add_reduce,add_top_k, andISoftMaxLayer - Renamed default GitHub branch to
mainand updated hyperlinks
- Updated Python documentation for
8.2.0 EA - 2021-10-05
- Demo applications showcasing TensorRT inference of HuggingFace Transformers.
- Support is currently extended to GPT-2 and T5 models.
- Added support for the following ONNX operators:
EinsumIsNanGatherNDScatterScatterElementsScatterNDSignRound
- Added support for building TensorRT Python API on Windows.
- Notable API updates in TensorRT 8.2.0.6 EA release. See TensorRT Developer Guide for details.
- Added three new APIs,
IExecutionContext: getEnqueueEmitsProfile(),setEnqueueEmitsProfile(), andreportToProfiler()which can be used to collect layer profiling info when the inference is launched as a CUDA graph. - Eliminated the global logger; each
Runtime,BuilderorRefitternow has its own logger. - Added new operators:
IAssertionLayer,IConditionLayer,IEinsumLayer,IIfConditionalBoundaryLayer,IIfConditionalOutputLayer,IIfConditionalInputLayer, andIScatterLayer. - Added new
IGatherLayermodes:kELEMENTandkND - Added new
ISliceLayermodes:kFILL,kCLAMP, andkREFLECT - Added new
IUnaryLayeroperators:kSIGNandkROUND - Added new runtime class
IEngineInspectorthat can be used to inspect the detailed information of an engine, including the layer parameters, the chosen tactics, the precision used, etc. ProfilingVerbosityenums have been updated to show their functionality more explicitly.
- Added three new APIs,
- Updated TensorRT OSS container defaults to cuda 11.4
- CMake to target C++14 builds.
- Updated following ONNX operators:
GatherandGatherElementsimplementations to natively support negative indicesPadlayer to support ND padding, along withedgeandreflectpadding mode supportIflayer with general performance improvements.
- Removed
sampleMLP. - Several flags of trtexec have been deprecated:
--explicitBatchflag has been deprecated and has no effect. When the input model is in UFF or in Caffe prototxt format, the implicit batch dimension mode is used automatically; when the input model is in ONNX format, the explicit batch mode is used automatically.--explicitPrecisionflag has been deprecated and has no effect. When the input ONNX model contains Quantization/Dequantization nodes, TensorRT automatically uses explicit precision mode.--nvtxMode=[verbose|default|none]has been deprecated in favor of--profilingVerbosity=[detailed|layer_names_only|none]to show its functionality more explicitly.
21.10 - 2021-10-05
- Benchmark script for demoBERT-Megatron
- Dynamic Input Shape support for EfficientNMS plugin
- Support empty dimensions in ONNX
- INT32 and dynamic clips through elementwise in ONNX parser
- Bump TensorRT version to 8.0.3.4
- Use static shape for only single batch single sequence input in demo/BERT
- Revert to using native FC layer in demo/BERT and FCPlugin only on older GPUs.
- Update demo/Tacotron2 for TensorRT 8.0
- Updates to TensorRT developer tools
- Polygraphy v0.33.0
- Added various examples, a CLI User Guide and how-to guides.
- Added experimental support for DLA.
- Added a
data to-inputtool that can combine inputs/outputs created by--save-inputs/--save-outputs. - Added a
PluginRefRunnerwhich provides CPU reference implementations for TensorRT plugins - Made several performance improvements in the Polygraphy CUDA wrapper.
- Removed the
to-jsontool which was used to convert Pickled data generated by Polygraphy 0.26.1 and older to JSON.
- Bugfixes and documentation updates in pytorch-quantization toolkit.
- Polygraphy v0.33.0
- Bumped up package versions: tensorflow-gpu 2.5.1, pillow 8.3.2
- ONNX parser enhancements and bugfixes
- Update ONNX submodule to v1.8.0
- Update convDeconvMultiInput function to properly handle deconvs
- Update RNN documentation
- Update QDQ axis assertion
- Fix bidirectional activation alpha and beta values
- Fix opset10
Resize - Fix shape tensor unsqueeze
- Mark BOOL tiles as unsupported
- Remove unnecessary shape tensor checks
- N/A
21.09 - 2021-09-22
- Add
ONNX2TRT_VERSIONoverwrite in CMake.
- Updates to TensorRT developer tools
- Fix assertion in EfficientNMSPlugin
- N/A
21.08 - 2021-08-05
- Add demoBERT and demoBERT-MT (sparsity) benchmark data for TensorRT 8.
- Added example python notebooks
- Updated samples and plugins directory structure
- Updates to TensorRT developer tools
- README fix to update build command for native aarch64 builds.
- N/A
21.07 - 2021-07-21
Identical to the TensorRT-OSS 8.0.1 Release.
8.0.1 - 2021-07-02
- Added support for the following ONNX operators:
Celu,CumSum,EyeLike,GatherElements,GlobalLpPool,GreaterOrEqual,LessOrEqual,LpNormalization,LpPool,ReverseSequence, andSoftmaxCrossEntropyLossdetails. - Rehauled
ResizeONNX operator, now fully supporting the following modes:- Coordinate Transformation modes:
half_pixel,pytorch_half_pixel,tf_half_pixel_for_nn,asymmetric, andalign_corners. - Modes:
nearest,linear. - Nearest Modes:
floor,ceil,round_prefer_floor,round_prefer_ceil.
- Coordinate Transformation modes:
- Added support for multi-input ONNX
ConvTranposeoperator. - Added support for 3D spatial dimensions in ONNX
InstanceNormalization. - Added support for generic 2D padding in ONNX.
- ONNX
QuantizeLinearandDequantizeLinearoperators leverageIQuantizeLayerandIDequantizeLayer.- Added support for tensor scales.
- Added support for per-axis quantization.
- Added
EfficientNMS_TRT,EfficientNMS_ONNX_TRTplugins and experimental support for ONNXNonMaxSuppressionoperator. - Added
ScatterNDplugin. - Added TensorRT QuickStart Guide.
- Added new samples: engine_refit_onnx_bidaf builds an engine from ONNX BiDAF model and refits engine with new weights, efficientdet and efficientnet samples for demonstrating Object Detection using TensorRT.
- Added support for Ubuntu20.04 and RedHat/CentOS 8.3.
- Added Python 3.9 support.
- Update Polygraphy to v0.30.3.
- Update ONNX-GraphSurgeon to v0.3.10.
- Update Pytorch Quantization toolkit to v2.1.0.
- Notable TensorRT API updates
- TensorRT now declares API’s with the
noexceptkeyword. All TensorRT classes that an application inherits from (such as IPluginV2) must guarantee that methods called by TensorRT do not throw uncaught exceptions, or the behavior is undefined. - Destructors for classes with
destroy()methods were previously protected. They are now public, enabling use of smart pointers for these classes. Thedestroy()methods are deprecated.
- TensorRT now declares API’s with the
- Moved
RefitMapAPI from ONNX parser to core TensorRT. - Various bugfixes for plugins, samples and ONNX parser.
- Port demoBERT to tensorflow2 and update UFF samples to leverage nvidia-tensorflow1 container.
IPluginandIPluginFactoryinterfaces were deprecated in TensorRT 6.0 and have been removed in TensorRT 8.0. We recommend that you write new plugins or refactor existing ones to target theIPluginV2DynamicExtandIPluginV2IOExtinterfaces. For more information, refer to Migrating Plugins From TensorRT 6.x Or 7.x To TensorRT 8.x.x.- For plugins based on
IPluginV2DynamicExtandIPluginV2IOExt, certain methods with legacy function signatures (derived fromIPluginV2andIPluginV2Extbase classes) which were deprecated and marked for removal in TensorRT 8.0 will no longer be available.
- For plugins based on
- Removed
samplePluginsince it showcased IPluginExt interface, which is no longer supported in TensorRT 8.0. - Removed
sampleMovieLensandsampleMovieLensMPS. - Removed Dockerfile for Ubuntu 16.04. TensorRT 8.0 debians for Ubuntu 16.04 require python 3.5 while minimum required python version for TensorRT OSS is 3.6.
- Removed support for PowerPC builds, consistent with TensorRT GA releases.
- We had deprecated the Caffe Parser and UFF Parser in TensorRT 7.0. They are still tested and functional in TensorRT 8.0, however, we plan to remove the support in a future release. Ensure you migrate your workflow to use
tf2onnx,keras2onnxor TensorFlow-TensorRT (TF-TRT). - Refer to TensorRT 8.0.1 GA Release Notes for additional details
21.06 - 2021-06-23
- Add switch for batch-agnostic mode in NMS plugin
- Add missing model.py in
uff_custom_pluginsample
- Update to Polygraphy v0.29.2
- Update to ONNX-GraphSurgeon v0.3.9
- Fix numerical errors for float type in NMS/batchedNMS plugins
- Update demoBERT input dimensions to match Triton requirement #1051
- Optimize TLT MaskRCNN plugins:
- enable fp16 precision in multilevelCropAndResizePlugin and multilevelProposeROIPlugin
- Algorithms optimization for NMS kernels and ROIAlign kernel
- Fix invalid cuda config issue when bs is larger than 32
- Fix issues found on Jetson NANO
- Removed fcplugin from demoBERT to improve latency
21.05 - 2021-05-20
- Extended support for ONNX operator
InstanceNormalizationto 5D tensors - Support negative indices in ONNX
Gatheroperator - Add support for importing ONNX double-typed weights as float
- ONNX-GraphSurgeon (v0.3.7) support for models with externally stored weights
- Update ONNX-TensorRT to 21.05
- Relicense ONNX-TensorRT under Apache2
- demoBERT builder fixes for multi-batch
- Speedup demoBERT build using global timing cache and disable cuDNN tactics
- Standardize python package versions across OSS samples
- Bugfixes in multilevelProposeROI and bertQKV plugin
- Fix memleaks in samples logger
21.04 - 2021-04-12
- SM86 kernels for BERT MHA plugin
- Added opset13 support for
SoftMax,LogSoftmax,Squeeze, andUnsqueeze. - Added support for the
EyeLikeandGatherElementsoperators.
- Updated TensorRT version to v7.2.3.4.
- Update to ONNX-TensorRT 21.03
- ONNX-GraphSurgeon (v0.3.4) - updates fold_constants to correctly exit early.
- Set default CUDA_INSTALL_DIR #798
- Plugin bugfixes, qkv kernels for sm86
- Fixed GroupNorm CMakeFile for cu sources #1083
- Permit groupadd with non-unique GID in build containers #1091
- Avoid
reinterpret_cast#146 - Clang-format plugins and samples
- Avoid arithmetic on void pointer in multilevelProposeROIPlugin.cpp #1028
- Update BERT plugin documentation.
- Removes extra terminate call in InstanceNorm
21.03 - 2021-03-09
- Optimized FP16 NMS/batchedNMS plugins with n-bit radix sort and based on
IPluginV2DynamicExt ProposalDynamicandCropAndResizeDynamicplugins based onIPluginV2DynamicExt
- ONNX-TensorRT v21.03 update
- ONNX-GraphSurgeon v0.3.3 update
- Bugfix for
scaledSoftmaxkernel
- N/A
21.02 - 2021-02-01
- TensorRT Python API bindings
- TensorRT Python samples
- FP16 support to batchedNMSPlugin #1002
- Configurable input size for TLT MaskRCNN Plugin #986
- TensorRT version updated to 7.2.2.3
- ONNX-TensorRT v21.02 update
- Polygraphy v0.21.1 update
- PyTorch-Quantization Toolkit v2.1.0 update
- Documentation update, ONNX opset 13 support, ResNet example
- ONNX-GraphSurgeon v0.28 update
- demoBERT builder updated to work with Tensorflow2 (in compatibility mode)
- Refactor Dockerfiles for OSS container
- N/A
20.12 - 2020-12-18
- Add configurable input size for TLT MaskRCNN Plugin
- Update symbol export map for plugins
- Correctly use channel dimension when creating Prelu node
- Fix Jetson cross compilation CMakefile
- N/A
20.11 - 2020-11-20
- API documentation for ONNX-GraphSurgeon
- N/A
20.10 - 2020-10-22
- Polygraphy v0.20.13 - Deep Learning Inference Prototyping and Debugging Toolkit
- PyTorch-Quantization Toolkit v2.0.0
- Updated BERT plugins for variable sequence length inputs
- Optimized kernels for sequence lengths of 64 and 96 added
- Added Tacotron2 + Waveglow TTS demo #677
- Re-enable
GridAnchorRect_TRTplugin with rectangular feature maps #679 - Update batchedNMS plugin to IPluginV2DynamicExt interface #738
- Support 3D inputs in InstanceNormalization plugin #745
- Added this CHANGELOG.md
- ONNX GraphSurgeon - v0.2.7 with bugfixes, new examples.
- demo/BERT bugfixes for Jetson Xavier
- Updated build Dockerfile to cuda-11.1
- Updated ClangFormat style specification according to TensorRT coding guidelines
- N/A
7.2.1 - 2020-10-20
- Polygraphy v0.20.13 - Deep Learning Inference Prototyping and Debugging Toolkit
- PyTorch-Quantization Toolkit v2.0.0
- Updated BERT plugins for variable sequence length inputs
- Optimized kernels for sequence lengths of 64 and 96 added
- Added Tacotron2 + Waveglow TTS demo #677
- Re-enable
GridAnchorRect_TRTplugin with rectangular feature maps #679 - Update batchedNMS plugin to IPluginV2DynamicExt interface #738
- Support 3D inputs in InstanceNormalization plugin #745
- Added this CHANGELOG.md
- ONNX GraphSurgeon - v0.2.7 with bugfixes, new examples.
- demo/BERT bugfixes for Jetson Xavier
- Updated build Dockerfile to cuda-11.1
- Updated ClangFormat style specification according to TensorRT coding guidelines
- N/A