Torchvision Faster R-CNN onnx export with dynamic batch size fails during inference

### 🐛 Describe the bug

```python 
from torchvision import models

frcnn = models.detection.fasterrcnn_resnet50_fpn_v2(pretrained=True)

import io

import torch

x = torch.rand(4, 3, 224, 224)
with io.BytesIO() as f:
    torch.onnx.export(
        frcnn,
        x,
        f,
        export_params=True,
        opset_version=20,
        do_constant_folding=True,
        keep_initializers_as_inputs=None,
        custom_opsets={"moka": 20},
        input_names=["images"],
        output_names=["output"],
        dynamic_axes={
            "images": {0: "batch_size", 2: "height", 3: "width"},
            "output": {0: "batch_size"},
        },
        dynamo=False,
    )
    onnx_model = f.getvalue()

import onnxruntime as ort

providers = ["CUDAExecutionProvider"] if torch.cuda.is_available() else ["CPUExecutionProvider"]
# use different batch size from x
ort_session = ort.InferenceSession(onnx_model , providers=providers)

ort_inputs = {
    ort_session.get_inputs()[0].name: torch.rand(2,3,448,224,).detach().numpy(),
}
ort_outputs = ort_session.run(None, ort_inputs)
```

`Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Split node. Name:'/Split' Status Message: Cannot split using values in 'split' attribute. Axis=0 Input shape={2,3,448,224} NumOutputs=4 Num entries in 'split' (must equal number of outputs) was 4 Sum of sizes in 'split' (must equal size of selected axis) was 4`

Above is a minimal example that fails. When images with same batch size as sample input are used at inference, it does not fail. 
What causes the error? 

### Versions

PyTorch version: 2.6.0+cu126
Is debug build: False
CUDA used to build PyTorch: 12.6
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Pro (10.0.22631 64비트)
GCC version: Could not collect
Clang version: Could not collect
CMake version: version 3.31.5
Libc version: N/A

Python version: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22631-SP0
Is CUDA available: True
CUDA runtime version: 12.8.61
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4080
Nvidia driver version: 571.96
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Name: 13th Gen Intel(R) Core(TM) i7-13700KF
Manufacturer: GenuineIntel
Family: 198
Architecture: 9
ProcessorType: 3
DeviceID: CPU0
CurrentClockSpeed: 3400
MaxClockSpeed: 3400
L2CacheSize: 24576
L2CacheSpeed: None
Revision: None

Versions of relevant libraries:
[pip3] numpy==2.2.2
[pip3] onnx==1.17.0
[pip3] onnxruntime-gpu==1.20.1
[pip3] onnxscript==0.2.0
[pip3] onnxsim==0.4.36
[pip3] pytorch-lightning==2.5.0.post0
[pip3] torch==2.6.0+cu126
[pip3] torchmetrics==1.6.1
[pip3] torchvision==0.21.0+cu126
[conda] Could not collect

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Torchvision Faster R-CNN onnx export with dynamic batch size fails during inference #8937

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Torchvision Faster R-CNN onnx export with dynamic batch size fails during inference #8937

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions