-
Notifications
You must be signed in to change notification settings - Fork 3.2k
NV TensorRT RTX EP - initial commit #24456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
New EP - currently based on existing TensorRT EP but meant to be used on RTX GPUs with a lean version of TensorRT.
@@ -0,0 +1,48 @@ | |||
#pragma once |
Check warning
Code scanning / lintrunner
CLANGFORMAT/format Warning
Run lintrunner -a to apply this patch.
@@ -0,0 +1,597 @@ | |||
// Copyright (c) Microsoft Corporation. All rights reserved. |
Check warning
Code scanning / lintrunner
CLANGFORMAT/format Warning
Run lintrunner -a to apply this patch.
@@ -0,0 +1,261 @@ | |||
// Copyright (c) Microsoft Corporation. All rights reserved. |
Check warning
Code scanning / lintrunner
CLANGFORMAT/format Warning
Run lintrunner -a to apply this patch.
@@ -0,0 +1,171 @@ | |||
// Copyright (c) Microsoft Corporation. All rights reserved. |
Check warning
Code scanning / lintrunner
CLANGFORMAT/format Warning
Run lintrunner -a to apply this patch.
@@ -0,0 +1,191 @@ | |||
// Copyright (c) Microsoft Corporation. All rights reserved. |
Check warning
Code scanning / lintrunner
CLANGFORMAT/format Warning
Run lintrunner -a to apply this patch.
@@ -0,0 +1,20 @@ | |||
// Copyright (c) Microsoft Corporation. All rights reserved. |
Check warning
Code scanning / lintrunner
CLANGFORMAT/format Warning
Run lintrunner -a to apply this patch.
@@ -0,0 +1,45 @@ | |||
#ifndef ONNXRUNTIME_NV_PROVIDER_OPTIONS_INTERNAL_H |
Check warning
Code scanning / lintrunner
CLANGFORMAT/format Warning
Run lintrunner -a to apply this patch.
@@ -0,0 +1,418 @@ | |||
// Copyright (c) Microsoft Corporation. All rights reserved. |
Check warning
Code scanning / lintrunner
CLANGFORMAT/format Warning
Run lintrunner -a to apply this patch.
Default to minimal CUDA compile
NvProviderFactory(const NvExecutionProviderInfo& info) : info_{info} {} | ||
~NvProviderFactory() override {} | ||
|
||
std::unique_ptr<IExecutionProvider> CreateProvider() override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to be able to use the new Compile API, could you please also add in implementation of the new CreateProvider()
overload that takes in OrtSessionOptions and OrtLogger?
Note that the EP provider options are added to the session options configs with a new prefix: "ep.<lowercase_ep_name>.<NV_PROVIDER_OPTION_KEY>"
.
Here's an example implementation:
std::unique_ptr<IExecutionProvider> CreateProvider(const OrtSessionOptions& session_options,
const OrtLogger& session_logger) override {
const ConfigOptions& config_options = session_options.GetConfigOptions();
const std::unordered_map<std::string, std::string>& config_options_map = config_options.GetConfigOptionsMap();
// The implementation of the SessionOptionsAppendExecutionProvider C API function automatically adds EP options to
// the session option configurations with the key prefix "ep.<lowercase_ep_name>.".
// We extract those EP options to create a new "provider options" key/value map.
std::string lowercase_ep_name = kNvTensorRTRTXExecutionProvider;
std::transform(lowercase_ep_name.begin(), lowercase_ep_name.end(), lowercase_ep_name.begin(), [](unsigned char c) {
return static_cast<char>(std::tolower(c));
});
std::unordered_map<std::string, std::string> provider_options;
std::string key_prefix = "ep.";
key_prefix += lowercase_ep_name;
key_prefix += ".";
for (const auto& [key, value] : config_options_map) {
if (key.rfind(key_prefix, 0) == 0) {
provider_options[key.substr(key_prefix.size())] = value;
}
}
// TODO: Create a NvExecutionProviderInfo struct from config_options and provider_options:
NvExecutionProviderInfo nv_info = /*...*/;
auto ep = std::make_unique<NvExecutionProvider>(nv_info);
ep->SetLogger(reinterpret_cast<const logging::Logger*>(&session_logger));
return ep;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also recommend use of the generic SessionOptionsAppendExecutionProvider
C API function, which automatically adds provider options to session options configs map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you please provide the use case of
CreateProvider(const OrtSessionOptions& session_options, const OrtLogger& session_logger) override
Unload the model once it is no longer needed. Bug: 5225623
@@ -0,0 +1,141 @@ | |||
// Copyright (c) Microsoft Corporation. All rights reserved. |
Check warning
Code scanning / lintrunner
CLANGFORMAT/format Warning
Run lintrunner -a to apply this patch.
* \snippet{doc} snippets.dox OrtStatus Return Value | ||
* \since Version 1.21 | ||
*/ | ||
ORT_API2_STATUS(SessionOptionsAppendExecutionProvider_Nv_TensorRT_RTX, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this require new provider specific APIs vs using SessionOptionsAppendExecutionProvider?
@adrianlizarraga
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NvTensorRT_RTX is created in a standalone shared dll. So we require the EP-specific API.
please address lintrunner failure |
Fix memory paging issue seen with large models.
…ion_options, const OrtLogger& session_logger)
NV TensorRt Rtx Ep
Ishwar/nv tensorrt rtx ep
Add support for python bindings of NV TensorRT RTX EP
New EP - currently based on existing TensorRT EP but meant to be used on RTX GPUs with a lean version of TensorRT.
Description
Adding a new EP based on TensorRT EP. This is going to use a special version of TensorRT optimized for RTX GPUs. In the future we plan to make changes to the EP to streamline it further (e.g, get rid of dependency on CUDA EP completely).
Motivation and Context
The new TensorRT for RTX is going to have:
This effort is also targeting WCR ML workflows.