Plugin TensorRT EP using ORT EP ABI #527

chilo-ms · 2025-08-22T22:39:00Z

Description

This plugin TRT EP is migrated from the original TRT EP and provides the implementations of OrtEpFactory, OrtEp, OrtNodeComputeInfo, OrtDataTransferImpl ... that are required for a plugin EP to be able to interact with ONNX Runtime via the EP ABI (introduced in ORT 1.23.0).

Plugin EP should be built independently without the ORT source code, as it relies on the API/ABI provided by ORT. Therefore, it should reside in a separate repository outside the main ORT repository.

This plugin TRT EP can be built on Linux and Windows with "Debug" and "Release" mode.

Build plugin TRT EP on Windows:

mkdir build;cd build

cmake -S ../ -B ./ -DCMAKE_BUILD_TYPE=Debug -DTENSORRT_HOME=C:/folder/to/trt -DORT_HOME=C:/folder/to/ort

cmake --build ./ --config Debug

(Note: The ORT_HOME should contain the include and lib folder as below)

C:/folder/to/ort
      | ----- lib
      |          | ----- onnxruntime.dll
      |          | ----- onnxruntime.lib
      |          | ----- onnxruntime.pdb
      |          ...
      |
      | ---- include
      |          | ----- onnxruntime_c_api.h
      |          | ----- onnxruntime_ep_c_api.h
      |          | ----- onnxruntime_cxx_api.h
      |          | ----- onnxruntime_cxx_inline_api.h
      |          ...

Build plugin TRT EP on Linux:

mkdir build;cd build

cmake -S ../ -B ./ -DCMAKE_BUILD_TYPE=Debug -DTENSORRT_HOME=/home/to/trt/ -DORT_HOME=/home/to/ort -DCMAKE_CUDA_ARCHITECTURES=80 -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc -DCMAKE_POSITION_INDEPENDENT_CODE=ON

cmake --build ./ --config Debug

Run the plugin TRT EP:
Please use onnxruntime_perf_test or onnx_test_runner

TODO
-Currently GetCapability assumes the whole graph is TRT eligible. Will have another PR to add TRT parser call for partition.
-Add simple unit test

…e mutiple GPU devices

edgchen1 · 2025-08-29T23:26:28Z

plugin_execution_providers/tensorrt/utils/helper.ccc

do we need all of these helper files? this one doesn't seem to be compiled, with the suffix ".ccc".

thanks for catching that, i removed them.

edgchen1 · 2025-08-29T23:31:56Z

plugin_execution_providers/tensorrt/tensorrt_execution_provider.h

+                                                                            : severity == Severity::kWARNING ? "WARNING"
+                                                                            : severity == Severity::kINFO    ? "   INFO"
+                                                                                                             : "UNKNOWN");
+      if (severity <= Severity::kERROR) {


would be good to actually log something

Added ORT default logger for TRT logger to print/log messages.
Will also add back default logger for plugin TRT EP as well.

edgchen1 · 2025-08-29T23:35:27Z

plugin_execution_providers/tensorrt/cuda_allocator.h

general comment: can we put all the code that doesn't need to be in the global namespace into a top-level namespace? maybe trt_ep or something. there is some existing code in onnxruntime but we probably should change that too.

namespace trt_ep is added and onnxruntime is removed. Thanks for the suggestion.

edgchen1 · 2025-08-29T23:47:10Z

plugin_execution_providers/tensorrt/utils/cuda/cuda_call.h

+      // char hostname[HOST_NAME_MAX];
+      // if (gethostname(hostname, HOST_NAME_MAX) != 0)
+      // strcpy(hostname, "?");
+      // #endif


general: there seems to be quite a lot of commented out code in this PR. it's not ideal because it can easily get out of date. can we avoid adding commented out code?

Most of the commented code is removed.

adrianlizarraga · 2025-09-10T17:49:28Z

plugin_execution_providers/tensorrt/CMakeLists.txt

@@ -0,0 +1,161 @@
+# usage:
+# cd build/
+# cmake -S ../ -B ./ -DCMAKE_BUILD_TYPE=Debug -DORT_HOME=/home/lochi/onnxruntime-win-x64-gpu-1.23.0 -DCMAKE_CUDA_ARCHITECTURES=80 -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc -DTENSORRT_HOME=/home/lochi/tensorrt/TensorRT-10.3.0.26 -DCMAKE_POSITION_INDEPENDENT_CODE=ON (see the result of "nvidia-smi --query-gpu=compute_cap --format=csv,noheader,nounits")


nit: perhaps should replace lochi with a generic user or something like it

Could it be put in the c_cxx folder along with other C/C++ examples?

nit: perhaps should replace lochi with a generic user or something like it

Removed specific username in the instruction.

adrianlizarraga · 2025-09-10T18:05:11Z

plugin_execution_providers/tensorrt/onnx_ctx_model_helper.cc

+  /*
+  std::vector<const OrtOpAttr*> node_attributes(num_node_attributes);
+  RETURN_IF_ERROR(ort_api.Node_GetAttributes(node, node_attributes.data(), node_attributes.size()));
+  */


nit: not needed anymore?

yes, i removed almost all the commented code.

adrianlizarraga · 2025-09-10T18:05:52Z

plugin_execution_providers/tensorrt/onnx_ctx_model_helper.cc

+
+  auto node = nodes[0];
+
+  size_t num_node_attributes = 0;


looks like this is not used.

yes, it's removed.

adrianlizarraga · 2025-09-10T18:08:55Z

plugin_execution_providers/tensorrt/onnx_ctx_model_helper.cc

+
+  const OrtOpAttr* node_attr = nullptr;
+  RETURN_IF_ERROR(ort_api.Node_GetAttributeByName(node, "embed_mode", &node_attr));
+  const int64_t embed_mode = reinterpret_cast<const ONNX_NAMESPACE::AttributeProto*>(node_attr)->i();


Since this EP is largely an example of how to develop an EP, should we try to use the public C apis to get the attribute values (i.e., ReadOpAttr) when possible? I think we want to show that an EP doesn't necessarily have to build with onnx to use these APIs.

Perhaps this wasn't done initially because the C API is cumbersome. But now that we have the C++ ORT APIs, getting the attribute values should hopefully be a one-liner.

That's a good suggestion. I used the C++ API to get attribute values instead.

adrianlizarraga · 2025-09-10T18:12:29Z

plugin_execution_providers/tensorrt/onnx_ctx_model_helper.cc

+    // Get engine from byte stream.
+    node_attr = nullptr;
+    RETURN_IF_ERROR(ort_api.Node_GetAttributeByName(node, "ep_cache_context", &node_attr));
+    const std::string& context_binary = reinterpret_cast<const ONNX_NAMESPACE::AttributeProto*>(node_attr)->s();


Same here. Could potentially use the C++ ORT API to get attr value?

adrianlizarraga · 2025-09-10T18:31:12Z

plugin_execution_providers/tensorrt/tensorrt_execution_provider.cc

+  } else {
+    output_tensors[i] = ctx.GetOutput(output_index, output_shapes);
+    auto& output_tensor = output_tensors[i];
+    const auto elem_cnt = output_tensor.GetTensorTypeAndShapeInfo().GetElementCount();


C++ API functions like this one can throw exceptions. Are these exceptions caught/handled somewhere in the EP (and maybe converted to a Status that can be returned to ORT)?

Good catch! i added try/catch there.

plugin_execution_providers/tensorrt/tensorrt_execution_provider.cc

adrianlizarraga · 2025-09-10T18:35:31Z

plugin_execution_providers/tensorrt/tensorrt_execution_provider.cc

+    // LOGS_DEFAULT(WARNING) << "[TensorRT EP] No graph will run on TensorRT execution provider";
+  } else if (number_of_trt_nodes == nodes.size()) {
+    // LOGS_DEFAULT(INFO) << "[TensorRT EP] Whole graph will run on TensorRT execution provider";
+  } else {
+    // LOGS_DEFAULT(INFO) << "[TensorRT EP] Graph is partitioned and number of subgraphs running on TensorRT execution provider is " << number_of_subgraphs;


nit: should the log statements be uncommented? (or maybe remove the if statements).

I added default logger for this plugin TRT EP and it can now log something.

ankitm3k · 2025-09-19T07:43:15Z

@chilo-ms Can you please also guide the changes for the wheel creation for using Python APIs independently with ORT TRT EP or any custom EP standalone code, for the latest API/ABI interfaces offered by ORT Core (available from ORT version 1.23.0) ?

As we know that we have decoupled the TRT EP from ORT source code, we can no longer access & compile below file -
https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/onnxruntime_pybind_state.cc

chilo-ms · 2025-09-19T21:39:59Z

@chilo-ms Can you please also guide the changes for the wheel creation for using Python APIs independently with ORT TRT EP or any custom EP standalone code, for the latest API/ABI interfaces offered by ORT Core (available from ORT version 1.23.0) ?

As we know that we have decoupled the TRT EP from ORT source code, we can no longer access & compile below file - https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/onnxruntime_pybind_state.cc

You don't need to make any changes for creating the ORT GPU wheel.
In fact, you can directly use the ORT python APIs (available from ORT 1.23.0) to run the plugin EP.
Please make sure you have the plugin EP DLL ready.

Here is the reference code:

import onnxruntime as onnxrt
import numpy as np                                                                                                                                                                                                                                                                      
ep_lib_path = "C:\\path\\to\\plugin_trt_ep\\TensorRTEp.dll"
ep_name = "TensorRTEp"
ep_registration_name = ep_name

onnxrt.register_execution_provider_library(ep_registration_name, ep_lib_path)

ep_devices = onnxrt.get_ep_devices()
trt_ep_device = None
for ep_device in ep_devices:
    if ep_device.ep_name == ep_name:
        trt_ep_device = ep_device

assert trt_ep_device != None                                                                                                                                                                                                                                                            
sess_options = onnxrt.SessionOptions()
sess_options.add_provider_for_devices([trt_ep_device], {'trt_engine_cache_enable': '1'})

assert sess_options.has_providers() == True

# Run sample model and check output
sess = onnxrt.InferenceSession("C:\\modles\\mul_1.onnx", sess_options=sess_options)

x = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]], dtype=np.float32)
input_name = sess.get_inputs()[0].name
res = sess.run([], {input_name: x})
output_expected = np.array([[1.0, 4.0], [9.0, 16.0], [25.0, 36.0]], dtype=np.float32)
np.testing.assert_allclose(output_expected, res[0], rtol=1e-05, atol=1e-08)

onnxrt.unregister_execution_provider_library(ep_registration_name)

The mul_1.onnx is in ORT repo here.

…usion_options

edgchen1 · 2025-09-29T17:38:29Z

plugin_execution_providers/tensorrt/CMakeLists.txt

+
+add_definitions(-DONNX_NAMESPACE=onnx)
+add_definitions(-DONNX_ML)
+add_definitions(-DNV_TENSORRT_MAJOR=10)


why does NV_TENSORRT_MAJOR need to be defined here? should we leave that to TensorRT?

Good catch, i removed it.

edgchen1 · 2025-10-02T15:53:50Z

plugin_execution_providers/tensorrt/cuda_allocator.h

+    OrtAllocator::AllocOnStream = nullptr;  // Allocate memory, handling usage across different Streams. Not used for TRT EP.
+  }
+  // TODO: Handle destructor
+  //~CUDAAllocator();


does anything need to be done for the CUDAAllocator and CUDAPinnedAllocator destructors or is the default implementation fine?

default implementation is fine. I removed the comment.

edgchen1 · 2025-10-02T16:34:23Z

plugin_execution_providers/tensorrt/onnx_ctx_model_helper.cc

+  ENFORCE(num_nodes == 1);
+
+  std::vector<const OrtNode*> nodes(num_nodes);
+  RETURN_IF_ERROR(ort_api.Graph_GetNodes(graph, nodes.data(), nodes.size()));


RETURN_IF_ERROR returns a OrtStatus*, right? but this function returns a bool and we probably don't want to convert a non-nullptr OrtStatus* to true.

also, there are multiple error handling mechanisms used in this function. is it possible to simplify the error handling by consistently returning an OrtStatus*?

Made them all return an OrtStatus*

edgchen1 · 2025-10-02T18:20:51Z

plugin_execution_providers/tensorrt/onnx_ctx_model_helper.cc

+      try {
+        ENFORCE(node_attr.GetType() == OrtOpAttrType::ORT_OP_ATTR_STRING);
+      } catch (const Ort::Exception& e) {
+        return ort_api.CreateStatus(ORT_EP_FAIL, e.what());
+      }


general: can this try-enforce-catch pattern be replaced with RETURN_IF_NOT()?

Nice suggestion, replace it with RETURN_IF_NOT()

edgchen1 · 2025-10-02T18:25:22Z

plugin_execution_providers/tensorrt/onnx_ctx_model_helper.cc

+                                    true,  // serialize refitted engine to disk
+                                    detailed_build_log_);
+      if (status != nullptr) {
+        return ort_api.CreateStatus(ORT_EP_FAIL, "RefitEngine failed.");


status should be freed or returned directly if it is not nullptr

Good catch, i made the code to return it directly.

edgchen1 · 2025-10-02T18:45:31Z

plugin_execution_providers/tensorrt/tensorrt_execution_provider.h

+
+namespace tensorrt_ptr {
+
+struct TensorrtInferDeleter {


why do we need TensorrtInferDeleter? does std::default_delete<T> work?

Okay, yup, std::default_delete<T> works and it's much more robust.

edgchen1 · 2025-10-02T19:28:46Z

plugin_execution_providers/tensorrt/tensorrt_execution_provider.cc

+                                              ("Plugin EP has been created with name " + name_).c_str(),
+                                              ORT_FILE, __LINE__, __FUNCTION__);
+  // ignore status for now
+  (void)ort_status;


maybe at least store it in Ort::Status so it still gets released if it's not nullptr

Good catch, Ort::Status is used now.

edgchen1 · 2025-10-08T20:37:57Z

plugin_execution_providers/tensorrt/onnx_ctx_model_helper.cc

+ */
+OrtStatus* EPContextNodeReader::ValidateEPCtxNode(const OrtGraph* graph) const {
+  size_t num_nodes = 0;
+  THROW_IF_ERROR(ort_api.Graph_GetNumNodes(graph, &num_nodes));


Suggested change

THROW_IF_ERROR(ort_api.Graph_GetNumNodes(graph, &num_nodes));

RETURN_IF_ERROR(ort_api.Graph_GetNumNodes(graph, &num_nodes));

edgchen1 · 2025-10-08T20:40:05Z

plugin_execution_providers/tensorrt/onnx_ctx_model_helper.cc

+                                         const OrtApi* ort_api);
+
+bool IsAbsolutePath(const std::string& path_string) {
+#ifdef _WIN32


general: do we need separate windows/non-windows implementations or is it possible to have a single cross-platform implementation of these path helper functions using std::filesystem?

yes, we can. I rewrite the function to have a single cross-platform implementation.

edgchen1 · 2025-10-08T20:42:30Z

plugin_execution_providers/tensorrt/onnx_ctx_model_helper.cc

+  if (ValidateEPCtxNode(&graph) != nullptr) {
+    return ort_api.CreateStatus(ORT_EP_FAIL, "It's not a valid EPContext node");
+  }


if ValidateEPCtxNode() returns a non-null OrtStatus*, it will be leaked. maybe just return it directly.

Suggested change

if (ValidateEPCtxNode(&graph) != nullptr) {

return ort_api.CreateStatus(ORT_EP_FAIL, "It's not a valid EPContext node");

}

RETURN_IF_ERROR(ValidateEPCtxNode(&graph));

good catch. updated.

edgchen1 · 2025-10-08T21:54:09Z

plugin_execution_providers/tensorrt/utils/ep_utils.h

+    }                          \
+  } while (0)
+
+#define RETURN_IF_ORT_STATUS_ERROR(fn) \


hm, it is confusing to have RETURN_IF_ORTSTATUS_ERROR, RETURN_IF_ORT_STATUS_ERROR, and RETURN_IF_ERROR. can we just have a single one?

Keep RETURN_IF_ERROR and the rest are removed.

edgchen1 · 2025-10-08T22:12:35Z

plugin_execution_providers/tensorrt/utils/helper.cc

+#endif  // #ifdef _WIN32
+
+#ifdef NO_EXCEPTIONS
+void PrintFinalMessage(const char* msg) {


is this function used?

No, it's removed now.

edgchen1 · 2025-10-08T22:26:52Z

plugin_execution_providers/tensorrt/tensorrt_provider_factory.cc

+}  // namespace trt_ep
+
+// To make symbols visible on macOS/iOS
+#ifdef __APPLE__


do we need to support macOS or iOS?

No, the macro is removed now.

chilo-ms added 30 commits June 22, 2025 19:49

plugin TRT EP init

36c0dc1

clean up GetCapabilityImpl and make it pass compiler for now

ed65a9f

Clean up CompileImpl

3269f73

update ep factory

4da9f90

update ep factory

1928767

update ep factory

4f5ffcb

clean up and add back onnx_ctx_model_helper.cc

bc64bdc

clean up

c4437a2

remove onnxruntime namespace

a5a294e

update

f990a7b

Add TRTEpNodeComputeInfo

7851a1c

add allocator and data transfer

be453b1

fix a lot of compile errors

3d6fa57

call EpDevice_AddAllocatorInfo in GetSupportedDevicesImpl

c8e3d6f

temporary way to get provider option without proper API

3c43029

Clean up cmake file to remove dependencies that built with ORT

549b29d

Update CompileImpl

3ad7736

add ort_graph_to_proto.h and leverage OrtGraphToProto utilities

3ced4cf

update EP context model helper

081de36

Convert onnxruntime::Status to OrtStatus

75240a4

remove unused files

f73420f

use GetSessionOptionsConfigEntries to get provider options

938a3fe

fix a bunch of compile errors

731ed72

update memory info and data transfer in TRT EP's factor to accommodat…

30e0f91

…e mutiple GPU devices

update cuda/pinned allocator to make compiler happy

f443a33

add GetVersionImpl in factory

95dd71e

update data transfer initialization in TRT EP

35b0cf1

Fix compile errors/issues

a65908f

fix to use correct API

c77391f

fix bug for gpu data transfer implementation

c5363e6

Update to use new API OpAttr_GetTensorAttributeAsOrtValue

e4c2405

edgchen1 reviewed Aug 29, 2025

View reviewed changes

chilo-ms added 2 commits September 8, 2025 13:45

remove unnecessary files

2472a15

Add default logger for TRT logger

ab8cd70

adrianlizarraga reviewed Sep 10, 2025

View reviewed changes

chilo-ms added 5 commits September 10, 2025 14:08

Add default logger for TRT EP

12d2306

update include path in utility function header

c6ae7b6

Add default logger for TRT EP (cont.)

6b180a4

put code under namespace trt_ep

b3ac797

remove unnecessary files

632d224

chilo-ms added 9 commits September 25, 2025 16:46

update GetCapabilityImpl()

4d32867

Add code for updating cache path for EPContext node

ae9686f

add onnx_external_data_bytestream support for refitting the engine

c8a6ae6

address reviewer's comments

5f17a2b

Add try/catch for c++ API that throws Ort::Exception

c103394

Set node_fusion_options.drop_constant_initializers to true for node_f…

bd3899d

…usion_options

remove unused code

6fd05ef

add missing trt_ep namespace

0b5c65c

remove the remaining commented code

c69dd60

edgchen1 reviewed Oct 2, 2025

View reviewed changes

address reviewer's comments

5ab50ac

edgchen1 reviewed Oct 8, 2025

View reviewed changes

chilo-ms mentioned this pull request Oct 13, 2025

[Plugin TRT EP] Add pipelines to build and test plugin TRT EP #540

Draft

address reviewer's comments

0bb1a4d

edgchen1 approved these changes Oct 16, 2025

View reviewed changes

chilo-ms merged commit 0c0e20d into main Oct 16, 2025
30 of 33 checks passed

chilo-ms deleted the chi/plugin_trt_ep_impl branch October 16, 2025 15:51

snnn mentioned this pull request Oct 21, 2025

[CANN] Fix the ACL_ERROR_REPEAT_INITIALIZE error that occurs when coexisting… microsoft/onnxruntime#26193

Open

	THROW_IF_ERROR(ort_api.Graph_GetNumNodes(graph, &num_nodes));
	RETURN_IF_ERROR(ort_api.Graph_GetNumNodes(graph, &num_nodes));

Plugin TensorRT EP using ORT EP ABI #527

Plugin TensorRT EP using ORT EP ABI #527

Conversation

chilo-ms commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ankitm3k commented Sep 19, 2025

Uh oh!

chilo-ms commented Sep 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chilo-ms Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chilo-ms commented Aug 22, 2025 •

edited

Loading

chilo-ms Oct 6, 2025 •

edited

Loading