Skip to content

Adding CIG context creation in OrtFactory#27267

Draft
praneshgo wants to merge 2 commits intomicrosoft:mainfrom
praneshgo:pgonegandla/cig_context
Draft

Adding CIG context creation in OrtFactory#27267
praneshgo wants to merge 2 commits intomicrosoft:mainfrom
praneshgo:pgonegandla/cig_context

Conversation

@praneshgo
Copy link
Contributor

  • Added Init and Deinit API which are to be called from application before calling any interop or ORT APIs

Description

Motivation and Context

- Added Init and Deinit API which are to be called from application before calling any interop or ORT APIs
}

// Create stream on the CIG context
CUDA_RETURN_IF_ERROR(cudaStreamCreateWithFlags(&stream, cudaStreamNonBlocking));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think whenever we get the CreateSyncStream call - we have to create a new stream (and when a CiG context is enabled, this stream should get created on the CiG context). I don't think you can re-use the same previously created stream (btw, CUDA allows creating multiple streams in the same context).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stream here is a new stream that is created every time CreateSyncStream is called. The current design seems in line with the expectation in the comment.
@ankan-ban Am I missing something here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah.. sorry, I missed it. You have exactly same code for stream creation in if and else part (maybe move it out of the conditional code).

@praneshgo
Copy link
Contributor Author

@gaugarg-nv @gedoensmax
Can you please review this change? Thank you.

@skottmckay skottmckay requested a review from nieubank February 10, 2026 22:19
* For D3D12: ID3D12CommandQueue*
* For Vulkan: VkQueue (cast to void*)
*
* The factory stores this and uses it for synchronization with inference streams.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: factory can do whatever it wants so I would expect storing or not is an implementation detail and not a requirement.

Comment on lines +7317 to +7321
* bound to the provided graphics command queue. Once initialized, any OrtSyncStream created for this
* ep_device via CreateSyncStreamForEpDevice will be created on the interop context, enabling efficient
* GPU-side synchronization between ONNX Runtime inference and graphics workloads.
*
* This must be called BEFORE CreateSyncStreamForEpDevice for the same ep_device.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seem like they're potentially implementation specific details. Isn't it up to the EP factory what it does in the init and how that affects later calls?

Would also be good to explicitly call out that as this is tied to the OrtEpDevice instance it applies across all sessions the OrtEpDevice is used in (i.e. it's global not per-session).

Comment on lines +1076 to +1081
/** \brief Graphics device handle (optional, may be inferred from command_queue).
*
* For D3D12: ID3D12Device* (optional, can be obtained from command queue)
* For Vulkan: VkDevice (cast to void*)
*/
void* device;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the device be retrieved using info from the OrtEpDevice or does it need to be passed in? Prefer the former as there's no potential mismatch.


# --- DX Interop Feature ---
dx_interop_group = parser.add_argument_group("DX Interop Feature")
dx_interop_group.add_argument("--use_dx_interop", action="store_true", help="Enable DX Interop feature for graphics API synchronization.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: --enable_dx_interop would match the help and I think would describe what's it's doing slightly more clearly.

// bound to the provided graphics command queue/device.
// This follows Scott's suggestion to pass in everything required so we don't end up with multiple
// init function signatures, and the factory stores the queue to utilize in stream creation.
static OrtStatus* ORT_API_CALL InitGraphicsInteropImpl(OrtEpFactory* this_ptr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure where you typically put unit tests for your code but I would have expected them to be in this PR if it's in this repo.

Comment on lines +7330 to +7331
ORT_API2_STATUS(InitGraphicsInteropForEpDevice, _In_ const OrtEpDevice* ep_device,
_In_ const OrtGraphicsInteropConfig* config);
Copy link
Contributor

@skottmckay skottmckay Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please add these two functions to the interop API struct instead of the global ORT API so we keep all the interop related pieces in one place?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, could this be a configuration that is "applied to" an OrtEpDevice's InteropApi? Rather than it being an independent thing.

I'm not sure we really need to qualify it as "Graphics" either, that is a bit more domain specific than this would afford.

@skottmckay
Copy link
Contributor

  • Added Init and Deinit API which are to be called from application before calling any interop or ORT APIs

Description

Motivation and Context

Can you elaborate on the specific scenarios where an app developer needs this infrastructure as I would like to understand when the existing API is insufficient and it would be good to capture that info in the PR?

*/
void* command_queue;

/** \brief Graphics device handle (optional, may be inferred from command_queue).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't absolutely require a command_queue either right? That is more of a performance optimization.

This config shouldn't be required for the InteropApi to work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants