Skip to content

Releases: NVIDIA/TensorRT

TensorRT 11.1 Release

Choose a tag to compare

@kevinch-nv kevinch-nv released this 24 Jun 21:27
82d1dca

For more information, see the TensorRT 11.1 Release Notes

General

  • Default CUDA version updated to 13.3.
  • Added Ubuntu 26.04 container.
  • Added support for Python 3.14.
  • Added agent skills for various TensorRT workflows.

Samples

  • Added new sample cute_dsl_plugin.
  • Added a global performance tuner option in trtexec.

Plugins

  • Added new plugin topkLastDimPlugin.

Parsers

  • Updated TRT_Attention to support raggedness.

TensorRT 11.0 Release

Choose a tag to compare

@kevinch-nv kevinch-nv released this 02 Jun 22:08
771d8ef

For more information, see the TensorRT 11.0 Release Notes

General

  • As a new major version bump, TensorRT 11.0 brings many enhancements for its users. For full information, see the release notes.

Demos

  • The BERT demo has been removed.

Samples

  • Migrated samples/python/python_plugin and samples/python/onnx_custom_plugin/plugin to use V3 plugins

Plugins

  • Added a new Faster Air Top-K plugin
  • Removed deprecated plugins: batchTile, clip, coordConvAC, cropAndResize, gelu, leakyRelu, normalize, singleStepLSTM, specialSlice, split, nms, and proposal

Parsers

  • Added support for Swish operator
  • Added support for custom operators TRT_Attention, TRT_MoE, and TRT_KVCacheUpdate

TensorRT 10.16 Release

Choose a tag to compare

@kevinch-nv kevinch-nv released this 25 Mar 23:22
52399f5

For more information, see the TensorRT 10.16 Release Notes

General

  • Default CUDA version updated to CUDA 13.2.

Samples

  • Added sampleDistCollective sample to showcase multi-device execution in TensorRT.

Parsers

  • Added kADJUST_FOR_DLA flag to adjust parsing behavior for ONNX models to be more amenable for DLA hardware execution.
  • Added DistCollective operator support for multi-device execution in TensorRT.

TensorRT 10.15 Release

Choose a tag to compare

@kevinch-nv kevinch-nv released this 03 Feb 22:22
9973b2f

For more information, see the TensorRT 10.15 Release Notes:

Sample changes

  • Added 2 safety samples sampleSafeMNIST, and sampleSafePluginV3 to demonstrate how to use TensorRT with the safety workflow.
  • Added trtSafeExec to accompany the safety workflow release.
  • Added python/stream_writer to showcase how to serialize a TensorRT engine directly to a custom stream using the IStreamWriter interface, rather than writing to a file or a contiguous memory buffer.
  • Added python/strongly_type_autocast to demonstrate how to convert FP32 ONNX models to mixed precision (FP32-FP16) using ModelOpt's AutoCast tool and subsequently building the engine with TensorRT's Strong Typing mode.
  • Added sampleCudla to demonstrate how to use the cuDLA API to run TensorRT engines on the Deep Learning Accelerator (DLA) hardware, which is available on NVIDIA Jetson and DRIVE platforms.
  • Deprecated sampleCharRNN.

Plugin changes

  • Deprecated bertQKVToContextPlugin and will be removed in a future release. No alternatives are planned to be provided.

Parser changes

  • Added support for RotaryEmbedding, RMSNormalization and TensorScatter for improved LLM model support
  • Added more specialized quantization ops for models quantized through TensorRT ModelOptimizer.
  • Added kREPORT_CAPABILITY_DLA flag to enable per-node validation when building DLA engines through TensorRT.
  • Added kENABLE_PLUGIN_OVERRIDE flag to enable TensorRT plugin override for nodes that share names with user plugins.
  • Improved error reporting for models with multiple subgraphs, such as Loop or Scan nodes.

Demo changes

  • demoDiffusion: Stable Diffusion 1.5, 2.0 and 2.1 pipelines have been deprecated and removed.

TensorRT 10.14 Release

Choose a tag to compare

@asfiyab-nvidia asfiyab-nvidia released this 08 Nov 00:50
3b4ddc1

10.14 GA - 2025-11-7

  • Sample changes

    • Replace all pycuda usages with cuda-python APIs
    • Removed the efficientnet samples
    • Deprecated tensorflow_object_detection and efficientdet samples
    • Samples will no longer be released with the packages. The TensorRT GitHub repository will be the single source.
  • Parsers:

    • Added support for the Attention operator
    • Improved refit for ConstantOfShape nodes
  • Demos

    • demoDiffusion:
      • Added support for the Cosmos-Predict2 text2image and video2world pipelines

TensorRT 10.13.3 Release

Choose a tag to compare

@kevinch-nv kevinch-nv released this 09 Sep 00:16
94e2b9e

See the TensorRT 10.13.3 Release Notes for more information.

  • Added support for TensorRT API Capture and Replay feature, see the developer guide for more information.

Demo changes

  • Added support for Flux Kontext pipeline.

TensorRT 10.13.2 Release

Choose a tag to compare

@kevinch-nv kevinch-nv released this 19 Aug 16:44
a471b2a

10.13.2 GA - 2025-8-18

For more information, see the 10.13.2 release notes.

  • Added support for CUDA 13.0, dropped support for CUDA 11.X
  • Dropped support for Ubuntu 20.04
  • Dropped support for Python versions < 3.10 for samples and demos

TensorRT 10.13 Release

Choose a tag to compare

@kevinch-nv kevinch-nv released this 24 Jul 22:00
b8db91e

10.13.0 GA - 2025-7-24

  • Plugin changes
    • Fixed a division-by-zero error in geluPlugin that occured when the bias is omitted.
    • Completed transition away from using static plugin field/attribute member variables in standard plugins. There's no such need since presently, TRT does not access field information after plugin creators are destructed (deregistered from the plugin registry), nor does access such information without a creator instance.
  • Sample changes
    • Deprecated the yolov3_onnx sample due to unstable url of yolo weights.
    • Updated the 1_run_onnx_with_tensorrt and 2_construct_network_with_layer_apis samples to use cuda-python instead of PyCUDA for latest GPU/CUDA support.
  • Parser changes
    • Decreased memory usage when importing models with external weights
    • Added loadModelProto, loadInitializer and parseModelProto APIs for IParser. These APIs are meant to be used to load user initializers when parsing ONNX models.
    • Added loadModelProto, loadInitializer and refitModelProto APIs for IParserRefitter. These APIs are meant to be used to load user initializers when refitting ONNX models.
    • Deprecated IParser::parseWithWeightDescriptors.

TensorRT 10.12 Release

Choose a tag to compare

@akhilg-nv akhilg-nv released this 18 Jun 21:41
6d178ce

10.12.0 GA - 2025-6-10

Key Features and Updates:

  • Plugin changes
    • Migrated IPluginV2-descendent version 1 of cropAndResizeDynamic, to version 2, which implements IPluginV3.
    • Note: The newer versions preserve the attributes and I/O of the corresponding older plugin version. The older plugin versions are deprecated and will be removed in a future release
    • Deprecated the listed versions of the following plugins:
      • DecodeBbox3DPlugin (version 1)
      • DetectionLayer_TRT (version 1)
      • EfficientNMS_TRT (version 1)
      • FlattenConcat_TRT (version 1)
      • GenerateDetection_TRT (version 1)
      • GridAnchor_TRT (version 1)
      • GroupNormalizationPlugin (version 1)
      • InstanceNormalization_TRT (version 2)
      • ModulatedDeformConv2d (version 1)
      • MultilevelCropAndResize_TRT (version 1)
      • MultilevelProposeROI_TRT (version 1)
      • RPROI_TRT (version 1)
      • PillarScatterPlugin (version 1)
      • PriorBox_TRT (version 1)
      • ProposalLayer_TRT (version 1)
      • ProposalDynamic (version 1)
      • Region_TRT (version 1)
      • Reorg_TRT (version 2)
      • ResizeNearest_TRT (version 1)
      • ScatterND (version 1)
      • VoxelGeneratorPlugin (version 1)
  • Demo changes
  • Sample changes
  • Parser changes
    • Added support for integer-typed base tensors for Pow operations
    • Added support for custom MXFP8 quantization operations
    • Added support for ellipses, diagonal, and broadcasting in Einsum operations

TensorRT 10.11 Release

Choose a tag to compare

@asfiyab-nvidia asfiyab-nvidia released this 21 May 22:59
9255eb3

10.11.0 GA - 2025-5-21

Key Features and Updates:

  • Plugin changes
    • Migrated IPluginV2-descendent version 1 of modulatedDeformConvPlugin, to version 2, which implements IPluginV3.
    • Migrated IPluginV2-descendent version 1 of DisentangledAttention_TRT, to version 2, which implements IPluginV3.
    • Migrated IPluginV2-descendent version 1 of MultiscaleDeformableAttnPlugin_TRT, to version 2, which implements IPluginV3.
    • Note: The newer versions preserve the attributes and I/O of the corresponding older plugin version. The older plugin versions are deprecated and will be removed in a future release.
  • Demo changes
    • demoDiffusion
      • Added support for Stable Diffusion 3.5-medium and 3.5-large pipelines in BF16 and FP16 precisions.
  • Parser changes
    • Added kENABLE_UINT8_AND_ASYMMETRIC_QUANTIZATION_DLA parser flag to enable UINT8 asymmetric quantization on engines targeting DLA.
    • Removed restriction that inputs to RandomNormalLike and RandomUniformLike must be tensors.
    • Clarified limitations of scan outputs for Loop nodes.