Skip to content

Torchvision Faster R-CNN onnx export with dynamic batch size fails during inference #8937

Open
@davidgill97

Description

@davidgill97

🐛 Describe the bug

from torchvision import models

frcnn = models.detection.fasterrcnn_resnet50_fpn_v2(pretrained=True)

import io

import torch

x = torch.rand(4, 3, 224, 224)
with io.BytesIO() as f:
    torch.onnx.export(
        frcnn,
        x,
        f,
        export_params=True,
        opset_version=20,
        do_constant_folding=True,
        keep_initializers_as_inputs=None,
        custom_opsets={"moka": 20},
        input_names=["images"],
        output_names=["output"],
        dynamic_axes={
            "images": {0: "batch_size", 2: "height", 3: "width"},
            "output": {0: "batch_size"},
        },
        dynamo=False,
    )
    onnx_model = f.getvalue()

import onnxruntime as ort

providers = ["CUDAExecutionProvider"] if torch.cuda.is_available() else ["CPUExecutionProvider"]
# use different batch size from x
ort_session = ort.InferenceSession(onnx_model , providers=providers)

ort_inputs = {
    ort_session.get_inputs()[0].name: torch.rand(2,3,448,224,).detach().numpy(),
}
ort_outputs = ort_session.run(None, ort_inputs)

Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Split node. Name:'/Split' Status Message: Cannot split using values in 'split' attribute. Axis=0 Input shape={2,3,448,224} NumOutputs=4 Num entries in 'split' (must equal number of outputs) was 4 Sum of sizes in 'split' (must equal size of selected axis) was 4

Above is a minimal example that fails. When images with same batch size as sample input are used at inference, it does not fail.
What causes the error?

Versions

PyTorch version: 2.6.0+cu126
Is debug build: False
CUDA used to build PyTorch: 12.6
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Pro (10.0.22631 64비트)
GCC version: Could not collect
Clang version: Could not collect
CMake version: version 3.31.5
Libc version: N/A

Python version: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22631-SP0
Is CUDA available: True
CUDA runtime version: 12.8.61
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4080
Nvidia driver version: 571.96
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Name: 13th Gen Intel(R) Core(TM) i7-13700KF
Manufacturer: GenuineIntel
Family: 198
Architecture: 9
ProcessorType: 3
DeviceID: CPU0
CurrentClockSpeed: 3400
MaxClockSpeed: 3400
L2CacheSize: 24576
L2CacheSpeed: None
Revision: None

Versions of relevant libraries:
[pip3] numpy==2.2.2
[pip3] onnx==1.17.0
[pip3] onnxruntime-gpu==1.20.1
[pip3] onnxscript==0.2.0
[pip3] onnxsim==0.4.36
[pip3] pytorch-lightning==2.5.0.post0
[pip3] torch==2.6.0+cu126
[pip3] torchmetrics==1.6.1
[pip3] torchvision==0.21.0+cu126
[conda] Could not collect

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions