Description
Describe the issue
I am currently facing significant challenges while attempting to execute YOLOv8-seg.onnx with dynamic batch sizes on GPU using ONNX Runtime for Web. Specifically, the model runs correctly only when the batch size is set to 1. However, increasing the batch size results in false detections and incorrect outputs. Notably, both output0 and output1 terminate with zeros in their data under these conditions.
To reproduce
To optimize performance using GPU acceleration, I am utilizing ONNX Runtime for Web with WebGPU as the execution provider.
1-Export YOLOv8-seg model to ONNX format, supporting dynamic batch sizes.
from ultralytics import YOLO
model = YOLO("yolov8n-seg.pt")
model.export(format='onnx', dynamic=True, simplify=True, opset=12, path='yolov8-seg.onnx')
I attempted to export the model using torch.onnx.export as well, but encountered the same issue.
2-Load the ONNX model using the provided JavaScript snippet, specifying WebGPU as the execution provide
<script src="https://cdnjs.cloudflare.com/ajax/libs/onnxruntime-web/1.16.1/ort.webgpu.min.js"></script>
const session = await ort.InferenceSession.create("./yolov8-seg.onnx", { executionProviders: ["webgpu"] });
3-Perform inference with various dynamic batch sizes (e.g., 1, 2, 4).
Urgency
I have been diligently working to resolve this error for the past two weeks, and it is urgent.
Platform
Windows
OS Version
10
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.17.1
Architecture
X64
Execution Provider
CUDA (AMD Radeon(TM) R5 Graphics)
Is this a quantized model?
No