Skip to content

Commit 84dc5dd

Browse files
committed
Merge branch 'gh-pages' into update-config-spec
2 parents c4c8cc1 + 67491a3 commit 84dc5dd

File tree

7 files changed

+271
-3
lines changed

7 files changed

+271
-3
lines changed

docs/build/eps.md

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,21 @@ These instructions are for the latest [JetPack SDK](https://developer.nvidia.com
235235

236236
* For a portion of Jetson devices like the Xavier series, higher power mode involves more cores (up to 6) to compute but it consumes more resource when building ONNX Runtime. Set `--parallel 1` in the build command if OOM happens and system is hanging.
237237

238+
## TensorRT-RTX
239+
240+
See more information on the NV TensorRT RTX Execution Provider [here](../execution-providers/TensorRTRTX-ExecutionProvider.md).
241+
242+
### Prerequisites
243+
{: .no_toc }
244+
245+
* Follow [instructions for CUDA execution provider](#cuda) to install CUDA and setup environment variables.
246+
* Intall TensorRT for RTX from nvidia.com (TODO: add link when available)
247+
248+
### Build Instructions
249+
{: .no_toc }
250+
`build.bat --config Release --parallel 32 --build_dir _build --build_shared_lib --use_nv_tensorrt_rtx --tensorrt_home "C:\dev\TensorRT-RTX-1.1.0.3" --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9" --cmake_generator "Visual Studio 17 2022" --use_vcpkg`
251+
Replace the --tensorrt_home and --cuda_home with correct paths to CUDA and TensorRT-RTX installations.
252+
238253
## oneDNN
239254

240255
See more information on oneDNN (formerly DNNL) [here](../execution-providers/oneDNN-ExecutionProvider.md).
@@ -625,7 +640,7 @@ Dockerfile instructions are available [here](https://github.com/microsoft/onnxru
625640
626641
#### Build Phython Wheel
627642
628-
`./build.sh --config Release --build --build_wheel --parallel --use_migraphx --migraphx_home /opt/rocm`
643+
`./build.sh --config Release --build_wheel --parallel --use_migraphx --migraphx_home /opt/rocm`
629644
630645
Then the python wheels(*.whl) could be found at ```./build/Linux/Release/dist```.
631646
@@ -654,7 +669,7 @@ Dockerfile instructions are available [here](https://github.com/microsoft/onnxru
654669
655670
#### Build Phython Wheel
656671
657-
`./build.sh --config Release --build --build_wheel --parallel --use_rocm --rocm_home /opt/rocm`
672+
`./build.sh --config Release --build_wheel --parallel --use_rocm --rocm_home /opt/rocm`
658673
659674
Then the python wheels(*.whl) could be found at ```./build/Linux/Release/dist```.
660675

docs/execution-providers/DirectML-ExecutionProvider.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ DirectML was introduced in Windows 10, version 1903, and in the corresponding ve
4444
Requirements for building the DirectML execution provider:
4545

4646
1. Visual Studio 2017 toolchain
47-
2. [The Windows 10 SDK (10.0.18362.0) for Windows 10, version 1903](https://developer.microsoft.com/en-us/windows/downloads/windows-10-sdk) (or newer)
47+
2. [The Windows 10 SDK (10.0.17134.0) for Windows 10, version 1803](https://developer.microsoft.com/en-us/windows/downloads/sdk-archive/index-legacy) (or newer)
4848

4949
To build onnxruntime with the DML EP included, supply the `--use_dml` flag to `build.bat`.
5050
For example:

docs/execution-providers/OpenVINO-ExecutionProvider.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -340,6 +340,8 @@ The following table lists all the available configuration options for API 2.0 an
340340
| enable_opencl_throttling | string | True/False | boolean | This option enables OpenCL queue throttling for GPU devices (reduces CPU utilization when using GPU). |
341341
| enable_qdq_optimizer | string | True/False | boolean | This option enables QDQ Optimization to improve model performance and accuracy on NPU. |
342342
| load_config | string | Any custom JSON path | string | This option enables a feature for loading custom JSON OV config during runtime which sets OV parameters. |
343+
| disable_dynamic_shapes | string | True/False | boolean | This option enables rewriting dynamic shaped models to static shape at runtime and execute. |
344+
| model_priority | string | LOW, MEDIUM, HIGH, DEFAULT | string | This option configures which models should be allocated to the best resource. |
343345
344346
345347
Valid Hetero or Multi or Auto Device combinations:
Lines changed: 236 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,236 @@
1+
---
2+
title: NVIDIA - TensorRT RTX
3+
description: Instructions to execute ONNX Runtime on NVIDIA RTX GPUs with the Nvidia TensorRT RTX execution provider
4+
parent: Execution Providers
5+
nav_order: 17
6+
redirect_from: /docs/reference/execution-providers/TensorRTRTX-ExecutionProvider
7+
---
8+
9+
# Nvidia TensorRT RTX Execution Provider
10+
{: .no_toc }
11+
12+
Nvidia TensorRT RTX execution provider is the preferred execution provider for GPU acceleration on consumer hardware (RTX PCs). It is more straightforward to use than the datacenter focused legacy TensorRT Execution provider and more performant than CUDA EP.
13+
Just some of the things that make it a better fit on RTX PCs than our legacy TensorRT Execution Provider:
14+
* Much smaller footprint
15+
* Much faster model compile/load times.
16+
* Better usability in terms of use of cached models across multiple RTX GPUs.
17+
18+
The Nvidia TensorRT RTX execution provider in the ONNX Runtime makes use of NVIDIA's [TensorRT](https://developer.nvidia.com/tensorrt) RTX Deep Learning inferencing engine (TODO: correct link to TRT RTX documentation once available) to accelerate ONNX models on RTX GPUs. Microsoft and NVIDIA worked closely to integrate the TensorRT RTX execution provider with ONNX Runtime.
19+
20+
Currently TensorRT RTX supports RTX GPUs from Ampere or later architectures. Support for Turing GPUs is coming soon.
21+
22+
## Contents
23+
{: .no_toc }
24+
25+
* TOC placeholder
26+
{:toc}
27+
28+
## Install
29+
Please select the Nvidia TensorRT RTX version of Onnx Runtime: https://onnxruntime.ai/docs/install. (TODO!)
30+
31+
## Build from source
32+
See [Build instructions](../build/eps.md#TensorRT-RTX).
33+
34+
## Requirements
35+
36+
| ONNX Runtime | TensorRT-RTX | CUDA |
37+
| :----------- | :----------- | :------------- |
38+
| main | 1.0 | 12.0-12.9 |
39+
| 1.22 | 1.0 | 12.0-12.9 |
40+
41+
## Usage
42+
### C/C++
43+
```c++
44+
const auto& api = Ort::GetApi();
45+
Ort::SessionOptions session_options;
46+
api.SessionOptionsAppendExecutionProvider(session_options, "NvTensorRtRtx", nullptr, nullptr, 0);
47+
Ort::Session session(env, model_path, session_options);
48+
```
49+
50+
The C API details are [here](../get-started/with-c.md).
51+
52+
### Python
53+
To use TensorRT RTX execution provider, you must explicitly register TensorRT RTX execution provider when instantiating the `InferenceSession`.
54+
55+
```python
56+
import onnxruntime as ort
57+
sess = ort.InferenceSession('model.onnx', providers=['NvTensorRtRtxExecutionProvider'])
58+
```
59+
60+
## Configurations
61+
TensorRT RTX settings can be configured via [TensorRT Execution Provider Session Option](./TensorRTRTX-ExecutionProvider.md#execution-provider-options).
62+
63+
Here are examples and different [scenarios](./TensorRTRTX-ExecutionProvider.md#scenario) to set NV TensorRT RTX EP session options:
64+
65+
#### Click below for Python API example:
66+
67+
<details>
68+
69+
```python
70+
import onnxruntime as ort
71+
72+
model_path = '<path to model>'
73+
74+
# note: for bool type options in python API, set them as False/True
75+
provider_options = {
76+
'device_id': 0,
77+
'nv_dump_subgraphs': False,
78+
'nv_detailed_build_log': True,
79+
'user_compute_stream': stream_handle
80+
}
81+
82+
sess_opt = ort.SessionOptions()
83+
sess = ort.InferenceSession(model_path, sess_options=sess_opt, providers=[('NvTensorRTRTXExecutionProvider', provider_options)])
84+
```
85+
86+
</details>
87+
88+
#### Click below for C++ API example:
89+
90+
<details>
91+
92+
```c++
93+
Ort::SessionOptions session_options;
94+
95+
cudaStream_t cuda_stream;
96+
cudaStreamCreate(&cuda_stream);
97+
98+
// Need to put the CUDA stream handle in a string
99+
char streamHandle[32];
100+
sprintf_s(streamHandle, "%lld", (uint64_t)cuda_stream);
101+
102+
const auto& api = Ort::GetApi();
103+
std::vector<const char*> option_keys = {
104+
"device_id",
105+
"user_compute_stream", // this implicitly sets "has_user_compute_stream"
106+
};
107+
std::vector<const char*> option_values = {
108+
"1",
109+
streamHandle
110+
};
111+
112+
Ort::ThrowOnError(api.SessionOptionsAppendExecutionProvider(session_options, "NvTensorRtRtx", option_keys.data(), option_values.data(), option_keys.size()));
113+
114+
```
115+
116+
</details>
117+
118+
### Scenario
119+
120+
| Scenario | NV TensorRT RTX EP Session Option | Type |
121+
| :------------------------------------------------- | :----------------------------------------------------------------------------------------- | :----- |
122+
| Specify GPU id for execution | [device_id](./TensorRTRTX-ExecutionProvider.md#device_id) | int |
123+
| Set custom compute stream for GPU operations | [user_compute_stream](./TensorRTRTX-ExecutionProvider.md#user_compute_stream) | string |
124+
| Set TensorRT RTX EP GPU memory usage limit | [nv_max_workspace_size](./TensorRTRTX-ExecutionProvider.md#nv_max_workspace_size) | int |
125+
| Dump optimized subgraphs for debugging | [nv_dump_subgraphs](./TensorRTRTX-ExecutionProvider.md#nv_dump_subgraphs) | bool |
126+
| Capture CUDA graph for reduced launch overhead | [nv_cuda_graph_enable](./TensorRTRTX-ExecutionProvider.md#nv_cuda_graph_enable) | bool |
127+
| Enable detailed logging of build steps | [nv_detailed_build_log](./TensorRTRTX-ExecutionProvider.md#nv_detailed_build_log) | bool |
128+
| Define min shapes | [nv_profile_min_shapes](./TensorRTRTX-ExecutionProvider.md#nv_profile_min_shapes) | string |
129+
| Define max shapes | [nv_profile_max_shapes](./TensorRTRTX-ExecutionProvider.md#nv_profile_max_shapes) | string |
130+
| Define optimal shapes | [nv_profile_opt_shapes](./TensorRTRTX-ExecutionProvider.md#nv_profile_opt_shapes) | string |
131+
132+
> Note: for bool type options, assign them with **True**/**False** in python, or **1**/**0** in C++.
133+
134+
### Execution Provider Options
135+
136+
TensorRT RTX configurations can be set by execution provider options. It's useful when each model and inference session have their own configurations. All configurations should be set explicitly, otherwise default value will be taken.
137+
138+
##### device_id
139+
140+
* Description: GPU device ID.
141+
* Default value: 0
142+
143+
##### user_compute_stream
144+
145+
* Description: define the compute stream for the inference to run on. It implicitly sets the `has_user_compute_stream` option. The stream handle needs to be printed on a string as decimal number and passed down to the session options as shown in the example above.
146+
147+
* This can also be set using the python API.
148+
* i.e The cuda stream captured from pytorch can be passed into ORT-NV TensorRT RTX EP. Click below to check sample code:
149+
150+
<Details>
151+
152+
153+
```python
154+
import onnxruntime as ort
155+
import torch
156+
...
157+
sess = ort.InferenceSession('model.onnx')
158+
if torch.cuda.is_available():
159+
s = torch.cuda.Stream()
160+
provider_options = {
161+
'device_id': 0,
162+
'user_compute_stream': str(s.cuda_stream)
163+
}
164+
165+
sess = ort.InferenceSession(
166+
model_path,
167+
providers=[('NvTensorRtRtxExecutionProvider', provider_options)]
168+
)
169+
170+
options = sess.get_provider_options()
171+
assert "NvTensorRtRtxExecutionProvider" in options
172+
assert options["NvTensorRtRtxExecutionProvider"].get("user_compute_stream", "") == str(s.cuda_stream)
173+
...
174+
```
175+
176+
</Details>
177+
178+
* To take advantage of user compute stream, it is recommended to use [I/O Binding](https://onnxruntime.ai/docs/performance/device-tensor.html) to bind inputs and outputs to tensors in device.
179+
180+
##### nv_max_workspace_size
181+
182+
* Description: maximum workspace size in bytes for TensorRT RTX engine.
183+
184+
* Default value: 0 (lets TensorRT pick the optimal).
185+
186+
##### nv_dump_subgraphs
187+
188+
* Description: dumps the subgraphs if the ONNX was split across multiple execution providers.
189+
* This can help debugging subgraphs, e.g. by using `trtexec --onnx subgraph_1.onnx` and check the outputs of the parser.
190+
191+
##### nv_detailed_build_log
192+
193+
* Description: enable detailed build step logging on NV TensorRT RTX EP with timing for each engine build.
194+
195+
##### nv_cuda_graph_enable
196+
197+
* Description: this will capture a [CUDA graph](https://developer.nvidia.com/blog/cuda-graphs/) which can drastically help for a network with many small layers as it reduces launch overhead on the CPU.
198+
199+
##### nv_profile_min_shapes
200+
201+
##### nv_profile_max_shapes
202+
203+
##### nv_profile_opt_shapes
204+
205+
* Description: build with explicit dynamic shapes using a profile with the min/max/opt shapes provided.
206+
* By default TensorRT RTX engines will support dynamic shapes, for perofmance improvements it is possible to specify one or multiple explicit ranges of shapes.
207+
* The format of the profile shapes is `input_tensor_1:dim_1xdim_2x...,input_tensor_2:dim_3xdim_4x...,...`
208+
* These three flags should all be provided in order to enable explicit profile shapes feature.
209+
* Note that multiple TensorRT RTX profiles can be enabled by passing multiple shapes for the same input tensor.
210+
* Check TensorRT doc [optimization profiles](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#opt_profiles) for more details.
211+
212+
## NV TensorRT RTX EP Caches
213+
There are two major TRT RTX EP caches:
214+
* Embedded engine model / EPContext model
215+
* Internal TensorRT RTX cache
216+
217+
The internal TensorRT RTX cache is automatically managed by the EP. The user only needs to manage EPContext caching.
218+
**Caching is important to help reduce session creation time drastically.**
219+
220+
TensorRT RTX separates compilation into an ahead of time (AOT) compiled engine and a just in time (JIT) compilation. The AOT compilation can be stored as EPcontext model, this model will be compatible across multiple GPU generations.
221+
Upon loading such an EPcontext model TensorRT RTX will just in time compile the engine to fit to the used GPU. This JIT process is accelerated by TensorRT RTX's internal cache.
222+
For an example usage see:
223+
https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/providers/nv_tensorrt_rtx/nv_basic_test.cc
224+
225+
### More about Embedded engine model / EPContext model
226+
* TODO: decide on a plan for using weight-stripped engines by default. Fix the EP implementation to enable that. Explain the motivation and provide example on how to use the right options in this document.
227+
* EPContext models also **enable packaging an externally compiled engine** using e.g. `trtexec`. A [python script](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/tensorrt/gen_trt_engine_wrapper_onnx_model.py) that is capable of packaging such a precompiled engine into an ONNX file is included in the python tools. (TODO: document how this works with weight-stripped engines).
228+
229+
## Performance Tuning
230+
For performance tuning, please see guidance on this page: [ONNX Runtime Perf Tuning](./../performance/tune-performance/index.md)
231+
232+
When/if using [onnxruntime_perf_test](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/test/perftest#onnxruntime-performance-test), use the flag `-e nvtensorrttrx`.
233+
234+
235+
### TensorRT RTX Plugins Support
236+
TensorRT RTX doesn't support plugins.

src/images/logos/graiphic-logo.png

18.4 KB
Loading

src/routes/components/customers.svelte

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
import clearbladeLogo from '../../images/logos/clearblade-logo.png';
1616
import deezerLogo from '../../images/logos/deezer-logo.png';
1717
import goodnotesLogo from '../../images/logos/goodnotes-logo.png';
18+
import graiphicLogo from '../../images/logos/graiphic-logo.png';
1819
import huggingfaceLogo from '../../images/logos/huggingface-logo.png';
1920
import hypefactorsLogo from '../../images/logos/hypefactors-logo.png';
2021
import infarmLogo from '../../images/logos/infarm-logo.png';
@@ -100,6 +101,11 @@
100101
src: goodnotesLogo,
101102
alt: 'GoodNotes'
102103
},
104+
{
105+
href: './testimonials#Graiphic',
106+
src: graiphicLogo,
107+
alt: 'Graiphic'
108+
},
103109
{
104110
href: './testimonials#Hugging%20Face',
105111
src: huggingfaceLogo,

src/routes/testimonials/+page.svelte

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
import clearbladeLogo from '../../images/logos/clearblade-logo.png';
1414
import deezerLogo from '../../images/logos/deezer-logo.png';
1515
import goodnotesLogo from '../../images/logos/goodnotes-logo.png';
16+
import graiphicLogo from '../../images/logos/graiphic-logo.png'
1617
import huggingfaceLogo from '../../images/logos/huggingface-logo.png';
1718
import hypefactorsLogo from '../../images/logos/hypefactors-logo.png';
1819
import infarmLogo from '../../images/logos/infarm-logo.png';
@@ -135,6 +136,14 @@
135136
imgsrc: goodnotesLogo,
136137
imgalt: 'Goodnotes logo'
137138
},
139+
{
140+
title: 'Graiphic',
141+
quote:
142+
"With SOTA, we have developed the first complete ecosystem fully based on ONNX and ONNX Runtime. More than just supporting AI workloads, SOTA orchestrates graph-based computation at its core, enabling modular, scalable, and transparent execution across AI and non-AI domains alike. We believe ONNX is not just a format, it is the foundation for the future of graph-native computation.",
143+
author: 'Youssef Menjour, CTO and Co-founder, Graiphic',
144+
imgsrc: graiphicLogo,
145+
imgalt: 'Graiphic logo'
146+
},
138147
{
139148
title: 'Hugging Face',
140149
quote:

0 commit comments

Comments
 (0)