Avoid repetitive creation of fp4/fp8 native-custom-op domains for NvTensorRtRtx EP by vishalpandya1990 · Pull Request #27192 · microsoft/onnxruntime

vishalpandya1990 · 2026-01-28T12:28:22Z

Description

Avoid repetitive creation of FP4/FP8 native custom-ops in create method for custom-op domains (leaving plugin-based custom-op handling as is, as it was before native-custom-ops addition in PR-26555).
Avoid deleting the custom-op domains at destructor time, since those are created with static scope, so avoid potential double-delete.

Motivation and Context

Repetitive checks and creation of custom-ops domain is redundant. So, cleaning it up a bit.
Explicit deletion of static objects in destructor can lead to double-delete. So, avoiding it.

vishalpandya1990 · 2026-01-28T12:30:26Z

CC @yuslepukhin @tianleiwu

Copilot

Pull request overview

This PR refactors the NvTensorRTRTX execution provider's custom ops initialization to improve efficiency and correctness. It introduces a flag-based approach to avoid redundant initialization of FP4/FP8 native custom ops and removes potentially dangerous manual deletion of statically-allocated objects.

Changes:

Added native_custom_ops_initialized flag to track whether native custom ops (TRT_FP4DynamicQuantize, TRT_FP8QuantizeLinear, TRT_FP8DequantizeLinear) have been created
Restructured control flow to add already-initialized native ops to the domain list without re-creating them
Made release functions no-ops since the custom op domains are static objects managed by unique_ptr with static storage duration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider_custom_ops.cc

vishalpandya1990 · 2026-01-30T13:01:51Z

Error from the logs in failing job -

2026-01-30T10:44:34.0610538Z FAILED transformers/test_gqa.py::TestGQARegressions::test_gqa_rope_separate_qkv_bug - AssertionError: Torch not compiled with CUDA enabled

This doesn't look related to the change. Not sure if there is any regression in ToT.

vishalpandya1990 · 2026-02-04T08:31:48Z

Error from the logs in failing job -

2026-01-30T10:44:34.0610538Z FAILED transformers/test_gqa.py::TestGQARegressions::test_gqa_rope_separate_qkv_bug - AssertionError: Torch not compiled with CUDA enabled

This doesn't look related to the change. Not sure if there is any regression in ToT.

I have synced the branch.

vishalpandya1990 · 2026-02-09T06:12:37Z

@yuslepukhin @tianleiwu Can I get a review for this?

chilo-ms · 2026-02-09T17:19:00Z

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

azure-pipelines · 2026-02-09T17:19:20Z

Azure Pipelines successfully started running 4 pipeline(s).

onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider_custom_ops.cc

tianleiwu

The changes look correct and robust.

Thread Safety: The use of static std::mutex effectively protects the initialization logic.
Resource Management: Removing the manual delete corresponds correctly to the static ownership model.

Logic Correctness: The separation of native_custom_ops_initialized check ensures that native ops are returned correctly even if custom_op_domain is empty, while preventing duplicate initialization.

The PR improves stability and prevents potential memory corruption.

vishalpandya1990 · 2026-02-10T13:52:49Z

Error in React Native CI Pipeline / React Native CI Android (pull_request) doesn't look related.

Raw Logs - link

2026-02-10T08:53:29.1912762Z error: Command failed: "/usr/local/lib/android/sdk/emulator/emulator" -version 2026-02-10T08:53:29.1913515Z /usr/local/lib/android/sdk/emulator/qemu/linux-x86_64/qemu-system-x86_64: error while loading shared libraries: libpulse.so.0: cannot open shared object file: No such file or directory

2026-02-10T08:53:41.3067325Z @Thread mqt_native_modules(47): 2026-02-10T08:53:41.3068251Z java.lang.UnsatisfiedLinkError: No implementation found for void ai.onnxruntime.reactnative.OnnxruntimeModule.nativeCleanup() (tried Java_ai_onnxruntime_reactnative_OnnxruntimeModule_nativeCleanup and Java_ai_onnxruntime_reactnative_OnnxruntimeModule_nativeCleanup__) 2026-02-10T08:53:41.3069236Z at ai.onnxruntime.reactnative.OnnxruntimeModule.nativeCleanup(Native Method) 2026-02-10T08:53:41.3069746Z at ai.onnxruntime.reactnative.OnnxruntimeModule.invalidate(OnnxruntimeModule.java:39) 2026-02-10T08:53:41.3070218Z at com.facebook.react.bridge.ModuleHolder.destroy(ModuleHolder.java:109) ... 2026-02-10T08:53:41.3076089Z at com.facebook.react.bridge.queue.MessageQueueThreadImpl$4.run(MessageQueueThreadImpl.java:234) 2026-02-10T08:53:41.3076643Z at java.lang.Thread.run(Thread.java:920)

onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider_custom_ops.cc

yuslepukhin · 2026-02-10T19:24:55Z

onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider_custom_ops.cc

+  // Callers receive raw pointers via .get().
+  //  1. Manually deleting them would cause a double-free when the static unique_ptrs are destroyed at program exit.
+  //  2. Resetting the static unique_ptrs is also unsafe because other EP instances or InferenceSession objects
+  //     may still hold raw pointers to these same objects (handed out via domain_list).


Would this indicate a different problem of someone calling to destroy objects that are in-use? Should we fix that bug?

Another question, static objects would be destroyed just prior to this DLL being unloaded. We want to make sure that the entities being destroyed do not refer to another DLL that could potentially be unloaded first.

It is for the reason people usually introduce a special API to have control of the process and to destroy things at a safe time and not to delegate it to a OS dependent specifics when shared objects are unloaded and the order of static destruction.

Would this indicate a different problem of someone calling to destroy objects that are in-use?

Yes, this is a potential use-after-free scenario. I think it should get mitigated with current change.

We want to make sure that the entities being destroyed do not refer to another DLL
usually introduce a special API to have control of the process and to destroy things at a safe time

I see your point. Usually, we could have ref-counted concerned objects for handling this (or, make them part of EP instance, or session to avoid shared usage). However, I believe no cross-DLL memory is actually accessed during destruction today.

I think it will be better to decouple current change about avoid-repetition handling with any potential design changes on this part. Please let me know if this sounds okay to you.

Here my take on this. There is not a firmly defined policy here on handling these objects. I think we need to make a choice here:

Remove the Release functions and give away shared_ptrs OR

Use the Release functions so client code can destroy the objects when it KNOWS that raw pointers are no longer in use.

Until that happens, this is going to be never-ending chasing of the tail with different OS dependent issues.

tianleiwu requested a review from Copilot January 30, 2026 09:19

Copilot started reviewing on behalf of tianleiwu January 30, 2026 09:20 View session

Copilot AI reviewed Jan 30, 2026

View reviewed changes

onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider_custom_ops.cc Outdated Show resolved Hide resolved

vishalpandya1990 force-pushed the vipandya/debug_1 branch from 304db7c to 3e87df1 Compare February 4, 2026 08:30

yuslepukhin requested changes Feb 9, 2026

View reviewed changes

vishalpandya1990 force-pushed the vipandya/debug_1 branch from 3e87df1 to 88bf125 Compare February 10, 2026 06:25

tianleiwu reviewed Feb 10, 2026

View reviewed changes

yuslepukhin reviewed Feb 10, 2026

View reviewed changes

onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider_custom_ops.cc Outdated Show resolved Hide resolved

yuslepukhin reviewed Feb 10, 2026

View reviewed changes

vishalpandya1990 added 4 commits February 11, 2026 04:41

avoid repetitive creation fp4 native-custom-op domains

e31608a

nit

d68be0e

double delete comment update and static objs comment add

872f539

avoid new variable for native-ops check

3506183

vishalpandya1990 force-pushed the vipandya/debug_1 branch from 88bf125 to 3506183 Compare February 11, 2026 05:33

Conversation

vishalpandya1990 commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

vishalpandya1990 commented Jan 28, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

vishalpandya1990 commented Jan 30, 2026

Uh oh!

vishalpandya1990 commented Feb 4, 2026

Uh oh!

vishalpandya1990 commented Feb 9, 2026

Uh oh!

chilo-ms commented Feb 9, 2026

Uh oh!

azure-pipelines bot commented Feb 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Uh oh!

vishalpandya1990 commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

yuslepukhin Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

vishalpandya1990 Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vishalpandya1990 commented Jan 28, 2026 •

edited

Loading

vishalpandya1990 commented Feb 10, 2026 •

edited

Loading