NV TensorRT RTX EP - initial commit #24456

ankan-ban · 2025-04-17T13:53:48Z

New EP - currently based on existing TensorRT EP but meant to be used on RTX GPUs with a lean version of TensorRT.

Description

Adding a new EP based on TensorRT EP. This is going to use a special version of TensorRT optimized for RTX GPUs. In the future we plan to make changes to the EP to streamline it further (e.g, get rid of dependency on CUDA EP completely).

Motivation and Context

The new TensorRT for RTX is going to have:

Much smaller footprint
Much faster model compile/load times.
Better usability in terms of use of cached models across multiple RTX GPUs.

This effort is also targeting WCR ML workflows.

New EP - currently based on existing TensorRT EP but meant to be used on RTX GPUs with a lean version of TensorRT.

include/onnxruntime/core/providers/nv_tensorrt_rtx/nv_provider_options.h

onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider.h

onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider_helper.cc

onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider_info.cc

onnxruntime/core/providers/nv_tensorrt_rtx/nv_provider_factory.cc

onnxruntime/core/providers/nv_tensorrt_rtx/nv_provider_factory.h

onnxruntime/core/providers/nv_tensorrt_rtx/nv_provider_options_internal.h

onnxruntime/core/providers/nv_tensorrt_rtx/onnx_ctx_model_helper.cc

Default to minimal CUDA compile

onnxruntime/core/providers/nv_tensorrt_rtx/nv_provider_factory.cc

Unload the model once it is no longer needed. Bug: 5225623

onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider_custom_ops.cc

include/onnxruntime/core/session/onnxruntime_c_api.h

jywu-msft · 2025-04-17T23:16:17Z

please address lintrunner failure
(clangformat) https://github.com/microsoft/onnxruntime/actions/runs/14519273916/job/40745967030?pr=24456

Fix memory paging issue seen with large models.

…ion_options, const OrtLogger& session_logger)

NV TensorRt Rtx Ep

Ishwar/nv tensorrt rtx ep

Add support for python bindings of NV TensorRT RTX EP

use setShapeValuesV2 with 64 bit types to support latest TRT RTX builds.

use setShapeValuesV2

Also fix a couple of bugs to ensure the options are actually passed down.

Clean up old APIs and options

fix formatting

cmake/onnxruntime_unittests.cmake

jywu-msft · 2025-04-23T04:16:28Z

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows GPU Doc Gen CI Pipeline, Windows ARM64 QNN CI Pipeline, Windows x64 QNN CI Pipeline

azure-pipelines · 2025-04-23T04:16:50Z

Azure Pipelines successfully started running 5 pipeline(s).

jywu-msft · 2025-04-23T04:18:22Z

@chilo-ms please review.

Adds some testing infrastructure and removes lots of deprecated options

gedoensmax · 2025-04-23T07:13:50Z

@chilo-ms We will happily take any guidance on how to run more test using NV EP to find remaining bugs or implementation gaps.

…ixes remove debug logging

fix compile error in test

chilo-ms · 2025-04-23T17:22:44Z

@chilo-ms We will happily take any guidance on how to run more test using NV EP to find remaining bugs or implementation gaps.

We need to add a new pipeline/CI for building and testing this new NV EP which this PR hasn't done yet.
Considering this new TRT binary is not public yet, some questions here:

In terms of building the NV EP, just to double check, i assume those header files from public TRT, i.e. NvInferXXX.h, can be used. (update) setShapeValuesV2 seems to be a new API not yet in NvInferRuntime.h yet.
For running NV EP, our Windows pipeline uses A10, can NV EP run on A10? From what i heard it's limited to ampere+ gpus or 40xx/50xx gpus?
We might need to upload the private TRT binary to our internal blob storage and let new CI to fetch from there.

chilo-ms · 2025-04-23T17:57:26Z

To add a new NV EP pipeline in GitHub Action, please duplicate from TRT EP's first and then modify accordingly.

jywu-msft · 2025-04-23T19:25:43Z

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline

azure-pipelines · 2025-04-23T19:26:06Z

Azure Pipelines successfully started running 5 pipeline(s).

chilo-ms

The code in ORT that makes this new NV EP known to ORT looks good to me.
As for EP code and test/validation is not the focus for this PR, we can discuss later.

New EP - currently based on existing TensorRT EP but meant to be used on RTX GPUs with a lean version of TensorRT. ### Description Adding a new EP based on TensorRT EP. This is going to use a special version of TensorRT optimized for RTX GPUs. In the future we plan to make changes to the EP to streamline it further (e.g, get rid of dependency on CUDA EP completely). ### Motivation and Context The new TensorRT for RTX is going to have: 1. Much smaller footprint 2. Much faster model compile/load times. 3. Better usability in terms of use of cached models across multiple RTX GPUs. This effort is also targeting WCR ML workflows. --------- Co-authored-by: Maximilian Müller <[email protected]> Co-authored-by: Gaurav Garg <[email protected]> Co-authored-by: iraut <[email protected]> Co-authored-by: Hrishikesh Manohar <[email protected]> Co-authored-by: Maximilian Müller <[email protected]> Signed-off-by: bfilipek <[email protected]>

New EP - currently based on existing TensorRT EP but meant to be used on RTX GPUs with a lean version of TensorRT. ### Description Adding a new EP based on TensorRT EP. This is going to use a special version of TensorRT optimized for RTX GPUs. In the future we plan to make changes to the EP to streamline it further (e.g, get rid of dependency on CUDA EP completely). ### Motivation and Context The new TensorRT for RTX is going to have: 1. Much smaller footprint 2. Much faster model compile/load times. 3. Better usability in terms of use of cached models across multiple RTX GPUs. This effort is also targeting WCR ML workflows. --------- Co-authored-by: Maximilian Müller <[email protected]> Co-authored-by: Gaurav Garg <[email protected]> Co-authored-by: iraut <[email protected]> Co-authored-by: Hrishikesh Manohar <[email protected]> Co-authored-by: Maximilian Müller <[email protected]>

### Description Cherry pick the following into [rel-1.22.0](https://github.com/microsoft/onnxruntime/tree/rel-1.22.0) - (#24487) - (#24466) - (#24493) - (#24484) - (#24494) - (#24489) - (#24504) - (#24510) - (#24456) - (#24537) - (#24501) - (#24519) - (#24513) - (#24539) - (#24514) - (#24542) - (#24585) Not added: Planning to cherry pick Cuda Matmulnbits PRs once the fix for failing cuda pipeline is ready - (#24491) - (#24509) - (#24564) --------- Co-authored-by: Adrian Lizarraga <[email protected]> Co-authored-by: minfhong-quic <[email protected]> Co-authored-by: minfhong-quic <[email protected]> Co-authored-by: Justin Chu <[email protected]> Co-authored-by: Prathik Rao <[email protected]> Co-authored-by: Edward Chen <[email protected]> Co-authored-by: Ankan Banerjee <[email protected]> Co-authored-by: Maximilian Müller <[email protected]> Co-authored-by: Gaurav Garg <[email protected]> Co-authored-by: iraut <[email protected]> Co-authored-by: Hrishikesh Manohar <[email protected]> Co-authored-by: Maximilian Müller <[email protected]> Co-authored-by: Scott McKay <[email protected]> Co-authored-by: Jiajia Qin <[email protected]> Co-authored-by: kunal-vaishnavi <[email protected]> Co-authored-by: xhcao <[email protected]>

### Description Cherry pick the following into [rel-1.22.0](https://github.com/microsoft/onnxruntime/tree/rel-1.22.0) - (microsoft#24487) - (microsoft#24466) - (microsoft#24493) - (microsoft#24484) - (microsoft#24494) - (microsoft#24489) - (microsoft#24504) - (microsoft#24510) - (microsoft#24456) - (microsoft#24537) - (microsoft#24501) - (microsoft#24519) - (microsoft#24513) - (microsoft#24539) - (microsoft#24514) - (microsoft#24542) - (microsoft#24585) Not added: Planning to cherry pick Cuda Matmulnbits PRs once the fix for failing cuda pipeline is ready - (microsoft#24491) - (microsoft#24509) - (microsoft#24564) --------- Co-authored-by: vraspar <[email protected]> Co-authored-by: Adrian Lizarraga <[email protected]> Co-authored-by: minfhong-quic <[email protected]> Co-authored-by: minfhong-quic <[email protected]> Co-authored-by: Justin Chu <[email protected]> Co-authored-by: Prathik Rao <[email protected]> Co-authored-by: Edward Chen <[email protected]> Co-authored-by: Ankan Banerjee <[email protected]> Co-authored-by: Maximilian Müller <[email protected]> Co-authored-by: Gaurav Garg <[email protected]> Co-authored-by: iraut <[email protected]> Co-authored-by: Hrishikesh Manohar <[email protected]> Co-authored-by: Maximilian Müller <[email protected]> Co-authored-by: Scott McKay <[email protected]> Co-authored-by: Jiajia Qin <[email protected]> Co-authored-by: kunal-vaishnavi <[email protected]> Co-authored-by: xhcao <[email protected]>

@ankan-ban

### Description While cleaning up the options I missed the part in the provider bridge that translates session options to TRT options. To better integrate with current IHV work I adopted the principle that QNN and OV use to pipe through session options. Since all this is string based magic it would be great to be access a general point of truth like `EpContextModelGenerationOptions` in the provider wrappedtypes. https://github.com/microsoft/onnxruntime/blob/6df620675290d97d7e406faf232b8b521333b6e8/onnxruntime/core/framework/session_options.h#L73 This is a fix on top of #24456 @ankan-ban and @chilo-ms to review.

@ankan-ban

### Description While cleaning up the options I missed the part in the provider bridge that translates session options to TRT options. To better integrate with current IHV work I adopted the principle that QNN and OV use to pipe through session options. Since all this is string based magic it would be great to be access a general point of truth like `EpContextModelGenerationOptions` in the provider wrappedtypes. https://github.com/microsoft/onnxruntime/blob/6df620675290d97d7e406faf232b8b521333b6e8/onnxruntime/core/framework/session_options.h#L73 This is a fix on top of #24456 @ankan-ban and @chilo-ms to review.

@ankan-ban

### Description While cleaning up the options I missed the part in the provider bridge that translates session options to TRT options. To better integrate with current IHV work I adopted the principle that QNN and OV use to pipe through session options. Since all this is string based magic it would be great to be access a general point of truth like `EpContextModelGenerationOptions` in the provider wrappedtypes. https://github.com/microsoft/onnxruntime/blob/6df620675290d97d7e406faf232b8b521333b6e8/onnxruntime/core/framework/session_options.h#L73 This is a fix on top of microsoft#24456 @ankan-ban and @chilo-ms to review.

ankan-ban and others added 2 commits April 17, 2025 17:40

NV TensorRT RTX EP - initial commit

fb6a731

New EP - currently based on existing TensorRT EP but meant to be used on RTX GPUs with a lean version of TensorRT.

make CUDA Minimal the default for NV EP

27db0d7

ankan-ban marked this pull request as draft April 17, 2025 13:54

rename some remaining functions to NV TRT RTX

3391d2f

github-advanced-security bot found potential problems Apr 17, 2025

View reviewed changes

Merge pull request #1 from gedoensmax/gedoensmax/nv-tensorrt-rtx-ep

5a48dc2

Default to minimal CUDA compile

adrianlizarraga reviewed Apr 17, 2025

View reviewed changes

onnxruntime/core/providers/nv_tensorrt_rtx/nv_provider_factory.cc Show resolved Hide resolved

Fix memory paging issue seen with large models.

b730d49

Unload the model once it is no longer needed. Bug: 5225623

github-advanced-security bot found potential problems Apr 17, 2025

View reviewed changes

onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider_custom_ops.cc Fixed Show fixed Hide fixed

RyanUnderhill reviewed Apr 17, 2025

View reviewed changes

include/onnxruntime/core/session/onnxruntime_c_api.h Outdated Show resolved Hide resolved

ankan-ban and others added 19 commits April 18, 2025 07:41

Merge pull request #3 from gaugarg-nv/nv-tensorrt-rtx-ep

ac8b694

Fix memory paging issue seen with large models.

Fix: Apply clang-format formatting

e8b9aca

added NvProviderFactory::CreateProvider(const OrtSessionOptions& sess…

14986ad

…ion_options, const OrtLogger& session_logger)

added the implementation of SessionOptionsAppendExecutionProvider for

a33d103

NV TensorRt Rtx Ep

Fixed the help of onnxruntime_perf_test

90ca9e5

Add support for python bindings of NV TensorRT RTX EP

0267fb2

fixed review comments

718aade

Merge pull request #5 from ishwar-raut1/ishwar/nv-tensorrt-rtx-ep

57bee11

Ishwar/nv tensorrt rtx ep

Merge pull request #6 from hrishikeshm/nv-tensorrt-rtx-ep

9a838a5

Add support for python bindings of NV TensorRT RTX EP

use setShapeValuesV2

8c04376

use setShapeValuesV2 with 64 bit types to support latest TRT RTX builds.

Merge pull request #7 from ankan-ban/misc-fixes

e87e5cd

use setShapeValuesV2

Clean up old APIs and options

25312aa

Also fix a couple of bugs to ensure the options are actually passed down.

Merge pull request #9 from ankan-ban/options-clean-up

7dfa562

Clean up old APIs and options

fix formatting

d8a6ce5

Significantly reduce options

b998bfb

Merge pull request #10 from ankan-ban/minor-formatting-fixes

228f9ce

fix formatting

Merge branch 'microsoft:main' into nv-tensorrt-rtx-ep

3f34076

add naiive test

39608f2

better warning

62cd19c

snnn reviewed Apr 22, 2025

View reviewed changes

cmake/onnxruntime_unittests.cmake Show resolved Hide resolved

dynamic shape support for models with multiple subgraphs

8b7e443

jywu-msft requested a review from chilo-ms April 23, 2025 04:17

Merge pull request #11 from gedoensmax/gedoensmax/nv-reduce-code

38bf50a

Adds some testing infrastructure and removes lots of deprecated options

ankan-ban marked this pull request as ready for review April 23, 2025 04:34

ankan-ban and others added 3 commits April 23, 2025 12:55

fix compile error in test

11ca9ba

Merge pull request #12 from ishwar-raut1/ishwar/nv-trt-rtx-ep-typos-f…

fbca483

…ixes remove debug logging

Merge pull request #13 from ankan-ban/test-compile-error-fix

052c3c9

fix compile error in test

chilo-ms approved these changes Apr 24, 2025

View reviewed changes

jywu-msft approved these changes Apr 24, 2025

View reviewed changes

jywu-msft merged commit 2a09f27 into microsoft:main Apr 24, 2025
70 checks passed

jywu-msft added the release:1.22.0 label Apr 24, 2025

hooke007 mentioned this pull request Apr 25, 2025

New ORT_TRT port? AmusementClub/vs-mlrt#132

Open

gedoensmax mentioned this pull request Apr 25, 2025

[NV EP] fix EP context options #24545

Merged

vraspar mentioned this pull request Apr 28, 2025

Cherry-picks into rel-1.22.0 #24580

Merged

NV TensorRT RTX EP - initial commit #24456

NV TensorRT RTX EP - initial commit #24456

Uh oh!

Conversation

ankan-ban commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jywu-msft commented Apr 17, 2025

Uh oh!

Uh oh!

jywu-msft commented Apr 23, 2025

Uh oh!

azure-pipelines bot commented Apr 23, 2025

Uh oh!

jywu-msft commented Apr 23, 2025

Uh oh!

gedoensmax commented Apr 23, 2025

Uh oh!

chilo-ms commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chilo-ms commented Apr 23, 2025

Uh oh!

jywu-msft commented Apr 23, 2025

Uh oh!

azure-pipelines bot commented Apr 23, 2025

Uh oh!

chilo-ms left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ankan-ban commented Apr 17, 2025 •

edited

Loading

chilo-ms commented Apr 23, 2025 •

edited

Loading