Using separate cuda streams for one session

### Describe the issue

I have multiple threads that are calling session.run on one session. I recently made it so I am using pinned memory and asynchronous mem copies, which is working great. However, to do this I am using separate cuda streams for the mem copies. I noticed that session.run does not work with these cuda streams. I can link one cuda stream to a session via the options, but I want to use multiple cuda streams and a different one for every run call. How can I achieve this? Or should I just use multiple sessions instead? But then I will have multiple instances of the same model in memory, which doesn't seem great.

### To reproduce

--

### Urgency

_No response_

### Platform

Windows

### OS Version

10

### ONNX Runtime Installation

Built from Source

### ONNX Runtime Version or Commit ID

16.2

### ONNX Runtime API

C++

### Architecture

X64

### Execution Provider

CUDA

### Execution Provider Library Version

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using separate cuda streams for one session #23319

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Using separate cuda streams for one session #23319

Description

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions