[Performance]  Running YOLOv8-seg.onnx with Dynamic Batch Size on GPU

### Describe the issue

I am currently facing significant challenges while attempting to execute YOLOv8-seg.onnx with dynamic batch sizes on GPU using ONNX Runtime for Web. Specifically, the model runs correctly only when the batch size is set to 1. However, increasing the batch size results in false detections and incorrect outputs. Notably, both output0 and output1  terminate with zeros in their data under these conditions.


### To reproduce

To optimize performance using GPU acceleration, I am utilizing ONNX Runtime for Web with WebGPU as the execution provider. 

**1-Export YOLOv8-seg model to ONNX format, supporting dynamic batch sizes.**
```
from ultralytics import YOLO

model = YOLO("yolov8n-seg.pt")

model.export(format='onnx', dynamic=True, simplify=True, opset=12, path='yolov8-seg.onnx')
```

**I attempted to export the model using [torch.onnx.export](https://pytorch.org/docs/stable/onnx_torchscript.html#torch.onnx.export) as well, but encountered the same issue.**


**2-Load the ONNX model using the provided JavaScript snippet, specifying WebGPU as the execution provide**

```
<script src="https://cdnjs.cloudflare.com/ajax/libs/onnxruntime-web/1.16.1/ort.webgpu.min.js"></script>

const session = await ort.InferenceSession.create("./yolov8-seg.onnx", { executionProviders: ["webgpu"] });
```

**3-Perform inference with various dynamic batch sizes (e.g., 1, 2, 4).**

### Urgency

I have been diligently working to resolve this error for the past two weeks, and it is urgent.

### Platform

Windows

### OS Version

10 

### ONNX Runtime Installation

Built from Source

### ONNX Runtime Version or Commit ID

1.17.1


### Architecture

X64

### Execution Provider

CUDA (AMD Radeon(TM) R5 Graphics)


### Is this a quantized model?

No

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Performance] Running YOLOv8-seg.onnx with Dynamic Batch Size on GPU #21103

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

Architecture

Execution Provider

Is this a quantized model?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Performance] Running YOLOv8-seg.onnx with Dynamic Batch Size on GPU #21103

Description

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

Architecture

Execution Provider

Is this a quantized model?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions