You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -303,6 +304,13 @@ TensorRT configurations can be set by execution provider options. It's useful wh
303
304
304
305
> Note: not all Nvidia GPUs support FP16 precision.
305
306
307
+
##### trt_bf16_enable
308
+
309
+
310
+
* Description: enable BF16 mode in TensorRT.
311
+
312
+
> Note: not all Nvidia GPUs support BF16 precision.
313
+
306
314
##### trt_int8_enable
307
315
308
316
* Description: enable INT8 mode in TensorRT.
@@ -347,7 +355,7 @@ TensorRT configurations can be set by execution provider options. It's useful wh
347
355
348
356
* Engine will be cached when it's built for the first time so next time when new inference session is created the engine can be loaded directly from cache. In order to validate that the loaded engine is usable for current inference, engine profile is also cached and loaded along with engine. If current input shapes are in the range of the engine profile, the loaded engine can be safely used. Otherwise if input shapes are out of range, profile cache will be updated to cover the new shape and engine will be recreated based on the new profile (and also refreshed in the engine cache).
349
357
350
-
* Note each engine is created for specific settings such as model path/name, precision (FP32/FP16/INT8 etc), workspace, profiles etc, and specific GPUs and it's not portable, so it's essential to make sure those settings are not changing, otherwise the engine needs to be rebuilt and cached again.
358
+
* Note each engine is created for specific settings such as model path/name, precision (FP32/FP16/BF16/INT8 etc), workspace, profiles etc, and specific GPUs and it's not portable, so it's essential to make sure those settings are not changing, otherwise the engine needs to be rebuilt and cached again.
351
359
352
360
> **Warning: Please clean up any old engine and profile cache files (.engine and .profile) if any of the following changes:**
353
361
>
@@ -501,6 +509,8 @@ Following environment variables can be set for TensorRT execution provider. Clic
501
509
502
510
* `ORT_TENSORRT_FP16_ENABLE`: Enable FP16 mode in TensorRT. 1: enabled, 0: disabled. Default value: 0. Note not all Nvidia GPUs support FP16 precision.
503
511
512
+
* `ORT_TENSORRT_BF16_ENABLE`: Enable BF16 mode in TensorRT. 1: enabled, 0: disabled. Default value: 0. Note not all Nvidia GPUs support BF16 precision.
513
+
504
514
* `ORT_TENSORRT_INT8_ENABLE`: Enable INT8 mode in TensorRT. 1: enabled, 0: disabled. Default value: 0. Note not all Nvidia GPUs support INT8 precision.
505
515
506
516
* `ORT_TENSORRT_INT8_CALIBRATION_TABLE_NAME`: Specify INT8 calibration table file for non-QDQ models in INT8 mode. Note calibration table should not be provided for QDQ model because TensorRT doesn't allow calibration table to be loded if there is any Q/DQ node in the model. By default the name is empty.
@@ -512,7 +522,7 @@ Following environment variables can be set for TensorRT execution provider. Clic
512
522
513
523
* `ORT_TENSORRT_DLA_CORE`: Specify DLA core to execute on. Default value: 0.
514
524
515
-
* `ORT_TENSORRT_ENGINE_CACHE_ENABLE`: Enable TensorRT engine caching. The purpose of using engine caching is to save engine build time in the case that TensorRT may take long time to optimize and build engine. Engine will be cached when it's built for the first time so next time when new inference session is created the engine can be loaded directly from cache. In order to validate that the loaded engine is usable for current inference, engine profile is also cached and loaded along with engine. If current input shapes are in the range of the engine profile, the loaded engine can be safely used. Otherwise if input shapes are out of range, profile cache will be updated to cover the new shape and engine will be recreated based on the new profile (and also refreshed in the engine cache). Note each engine is created for specific settings such as model path/name, precision (FP32/FP16/INT8 etc), workspace, profiles etc, and specific GPUs and it's not portable, so it's essential to make sure those settings are not changing, otherwise the engine needs to be rebuilt and cached again. 1: enabled, 0: disabled. Default value: 0.
525
+
* `ORT_TENSORRT_ENGINE_CACHE_ENABLE`: Enable TensorRT engine caching. The purpose of using engine caching is to save engine build time in the case that TensorRT may take long time to optimize and build engine. Engine will be cached when it's built for the first time so next time when new inference session is created the engine can be loaded directly from cache. In order to validate that the loaded engine is usable for current inference, engine profile is also cached and loaded along with engine. If current input shapes are in the range of the engine profile, the loaded engine can be safely used. Otherwise if input shapes are out of range, profile cache will be updated to cover the new shape and engine will be recreated based on the new profile (and also refreshed in the engine cache). Note each engine is created for specific settings such as model path/name, precision (FP32/FP16/BF16/INT8 etc), workspace, profiles etc, and specific GPUs and it's not portable, so it's essential to make sure those settings are not changing, otherwise the engine needs to be rebuilt and cached again. 1: enabled, 0: disabled. Default value: 0.
516
526
* **Warning: Please clean up any old engine and profile cache files (.engine and .profile) if any of the following changes:**
517
527
* Model changes (if there are any changes to the model topology, opset version, operators etc.)
518
528
* ORT version changes (i.e. moving from ORT version 1.8 to 1.9)
0 commit comments