Skip to content

[Bug] Sporadic empty detections and crashes on ONNX converted Mask-RCNN instance segmentation models #2889

Open
@UrJoaoAvelino

Description

@UrJoaoAvelino

Checklist

  • I have searched related issues but cannot get the expected help.
  • 2. I have read the FAQ documentation but cannot get the expected help.
  • 3. The bug has not been fixed in the latest version.

Describe the bug

When running inference with an ONNX converted MaskRCNN model (looping always on the same image), we get sporadic zero'd bboxes ([0, 0, 0, 0]) with strange labels, for instance:

label: 286766112 | score: 0.29926052689552307 | bbox: 0, 0, 0, 0

Additionally, after some time the program crashes with the following error:

[2025-03-24 16:20:34.295] [mmdeploy] [error] [instance_segmentation.cpp:78] OpenCV(4.2.0) ../modules/core/src/matrix.cpp:235: error: (-215:Assertion failed) s >= 0 in function 'setSize'

terminate called after throwing an instance of 'system_error2::status_error<mmdeploy::StatusDomain>'
  what():  unknown (6) @ /root/workspace/mmdeploy/csrc/mmdeploy/codebase/mmdet/instance_segmentation.cpp:79
Aborted

We could only observe this issue when running on CPU. It occurs when using both Python and C# wrappers. We did not test for others, but we assume the root cause is in the C++ source.

As an example, we ran the following code:

from mmdeploy_runtime import Detector
import cv2
import sys


if len(sys.argv) < 2:
    print("Usage: python test_deploy.py device")
    sys.exit(1)

device = sys.argv[1]
print(f"Running for: {device}")

# Create a detector
detector = Detector(model_path='/root/workspace/tests/workdir', device_name=device, device_id=0)


empty_bbox_events = 0
all_ok_events = 0
for x in range(0, 500):
    img = cv2.imread('/root/workspace/mmdeploy/demo/resources/cityscapes.png')

    # Perform inference
    bboxes, labels, masks = detector(img)

    indices = [i for i in range(len(bboxes))]

    found_empty_bbox = False

    for index, bbox, label_id, mask in zip(indices, bboxes, labels, masks):
        [left, top, right, bottom], score = bbox[0:4].astype(int), bbox[4]

        if left == 0 and top == 0 and bottom ==0 and right == 0:
            found_empty_bbox = True
            print(f'[Iteration: {x}]: Empty bbox detected. Label: {label_id} | Bbox: {bbox}')
            break
    
    if found_empty_bbox:
        empty_bbox_events += 1
    else:
        all_ok_events += 1
        print(f'[Iteration: {x}]: No empty bboxes detected')


print(f'Iterations with empty bboxes: {empty_bbox_events}')
print(f'Iterations without empty bboxes: {all_ok_events}')

and obtained the following output:

[Iteration: 47]: No empty bboxes detected
[Iteration: 48]: No empty bboxes detected
[Iteration: 49]: No empty bboxes detected
[Iteration: 50]: No empty bboxes detected
[Iteration: 51]: No empty bboxes detected
[Iteration: 52]: Empty bbox detected. Label: 48 | Bbox: [0.0000000e+00 0.0000000e+00 9.0481046e-38 0.0000000e+00 3.6893488e+19]
[Iteration: 53]: No empty bboxes detected
[Iteration: 54]: No empty bboxes detected
[Iteration: 55]: No empty bboxes detected
[Iteration: 56]: No empty bboxes detected
[Iteration: 57]: No empty bboxes detected
[Iteration: 58]: No empty bboxes detected
[Iteration: 59]: No empty bboxes detected
[Iteration: 60]: No empty bboxes detected
[Iteration: 61]: No empty bboxes detected
[Iteration: 62]: No empty bboxes detected
[Iteration: 63]: No empty bboxes detected
[Iteration: 64]: No empty bboxes detected
[Iteration: 65]: No empty bboxes detected
[Iteration: 66]: No empty bboxes detected
[Iteration: 67]: No empty bboxes detected
[Iteration: 68]: No empty bboxes detected
[Iteration: 69]: No empty bboxes detected
[Iteration: 70]: No empty bboxes detected
[Iteration: 71]: No empty bboxes detected
[Iteration: 72]: No empty bboxes detected
[Iteration: 73]: No empty bboxes detected
[Iteration: 74]: No empty bboxes detected
[Iteration: 75]: No empty bboxes detected
[Iteration: 76]: No empty bboxes detected
[Iteration: 77]: Empty bbox detected. Label: 48 | Bbox: [8.547921e-44 0.000000e+00 9.496038e-38 0.000000e+00 3.689349e+19]
[Iteration: 78]: No empty bboxes detected
[Iteration: 79]: No empty bboxes detected
[Iteration: 80]: No empty bboxes detected
[Iteration: 81]: No empty bboxes detected
[Iteration: 82]: No empty bboxes detected
[Iteration: 83]: No empty bboxes detected
[Iteration: 84]: No empty bboxes detected
[Iteration: 85]: No empty bboxes detected
[Iteration: 86]: No empty bboxes detected
[Iteration: 87]: No empty bboxes detected
[Iteration: 88]: No empty bboxes detected
[Iteration: 89]: No empty bboxes detected
[2025-03-24 16:20:34.295] [mmdeploy] [error] [instance_segmentation.cpp:78] OpenCV(4.2.0) ../modules/core/src/matrix.cpp:235: error: (-215:Assertion failed) s >= 0 in function 'setSize'

terminate called after throwing an instance of 'system_error2::status_error<mmdeploy::StatusDomain>'
  what():  unknown (6) @ /root/workspace/mmdeploy/csrc/mmdeploy/codebase/mmdet/instance_segmentation.cpp:79
Aborted

Reproduction

To fully and easily reproduce the issue we are including a zip with scripts that pull the docker image, install mmdetection, convert the model to onnx and run inference. We included a versions for Windows (.bat) and Linux (.sh) as well as for each device (CPU or CUDA). To the best of our knowledge, the issue only happens when running inference on CPU, but we will continue tests on CUDA.

To test run:

prepare_container_and_convert_model_{cpu or cuda}.{sh or bat}
test_python_{cpu or cuda}.{sh or bat}

MMDeploy Debug.zip

Environment

To make it easy to reproduce, we are using the mmdeploy docker image: openmmlab/mmdeploy:ubuntu20.04-cuda11.8-mmdeploy1.3.1 where we installed mmdetection. However, we first encountered the issue in an environment without CUDA support (only CPU).

root@93466def0e75:~/workspace/mmdeploy# python3 tools/check_env.py 
03/24 16:48:47 - mmengine - INFO - 

03/24 16:48:47 - mmengine - INFO - **********Environmental information**********
03/24 16:48:48 - mmengine - INFO - sys.platform: linux
03/24 16:48:48 - mmengine - INFO - Python: 3.8.10 (default, May 26 2023, 14:05:08) [GCC 9.4.0]
03/24 16:48:48 - mmengine - INFO - CUDA available: True
03/24 16:48:48 - mmengine - INFO - MUSA available: False
03/24 16:48:48 - mmengine - INFO - numpy_random_seed: 2147483648
03/24 16:48:48 - mmengine - INFO - GPU 0: NVIDIA GeForce RTX 3050 Laptop GPU
03/24 16:48:48 - mmengine - INFO - CUDA_HOME: /usr/local/cuda
03/24 16:48:48 - mmengine - INFO - NVCC: Cuda compilation tools, release 11.8, V11.8.89
03/24 16:48:48 - mmengine - INFO - GCC: x86_64-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
03/24 16:48:48 - mmengine - INFO - PyTorch: 2.0.0+cu118
03/24 16:48:48 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.8
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.7
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

03/24 16:48:48 - mmengine - INFO - TorchVision: 0.15.0+cu118
03/24 16:48:48 - mmengine - INFO - OpenCV: 4.5.4
03/24 16:48:48 - mmengine - INFO - MMEngine: 0.10.3
03/24 16:48:48 - mmengine - INFO - MMCV: 2.1.0
03/24 16:48:48 - mmengine - INFO - MMCV Compiler: GCC 9.3
03/24 16:48:48 - mmengine - INFO - MMCV CUDA Compiler: 11.8
03/24 16:48:48 - mmengine - INFO - MMDeploy: 1.3.1+bc75c9d
03/24 16:48:48 - mmengine - INFO - 

03/24 16:48:48 - mmengine - INFO - **********Backend information**********
03/24 16:48:48 - mmengine - INFO - tensorrt:    8.6.1
03/24 16:48:48 - mmengine - INFO - tensorrt custom ops: Available
03/24 16:48:48 - mmengine - INFO - ONNXRuntime: None
03/24 16:48:48 - mmengine - INFO - ONNXRuntime-gpu:     1.15.1
03/24 16:48:48 - mmengine - INFO - ONNXRuntime custom ops:      Available
03/24 16:48:48 - mmengine - INFO - pplnn:       0.8.1
03/24 16:48:48 - mmengine - INFO - ncnn:        1.0.20230905
03/24 16:48:48 - mmengine - INFO - ncnn custom ops:     Available
03/24 16:48:48 - mmengine - INFO - snpe:        None
03/24 16:48:48 - mmengine - INFO - openvino:    2023.0.2
03/24 16:48:48 - mmengine - INFO - torchscript: 2.0.0+cu118
03/24 16:48:48 - mmengine - INFO - torchscript custom ops:      Available
03/24 16:48:48 - mmengine - INFO - rknn-toolkit:        None
03/24 16:48:48 - mmengine - INFO - rknn-toolkit2:       None
03/24 16:48:48 - mmengine - INFO - ascend:      None
03/24 16:48:48 - mmengine - INFO - coreml:      None
03/24 16:48:48 - mmengine - INFO - tvm: None
03/24 16:48:48 - mmengine - INFO - vacc:        None
03/24 16:48:48 - mmengine - INFO - 

03/24 16:48:48 - mmengine - INFO - **********Codebase information**********
03/24 16:48:48 - mmengine - INFO - mmdet:       3.3.0
03/24 16:48:48 - mmengine - INFO - mmseg:       None
03/24 16:48:48 - mmengine - INFO - mmpretrain:  None
03/24 16:48:48 - mmengine - INFO - mmocr:       None
03/24 16:48:48 - mmengine - INFO - mmagic:      None
03/24 16:48:48 - mmengine - INFO - mmdet3d:     None
03/24 16:48:48 - mmengine - INFO - mmpose:      None
03/24 16:48:48 - mmengine - INFO - mmrotate:    None
03/24 16:48:48 - mmengine - INFO - mmaction:    None
03/24 16:48:48 - mmengine - INFO - mmrazor:     None
03/24 16:48:48 - mmengine - INFO - mmyolo:      None

Error traceback

[2025-03-24 16:20:34.295] [mmdeploy] [error] [instance_segmentation.cpp:78] OpenCV(4.2.0) ../modules/core/src/matrix.cpp:235: error: (-215:Assertion failed) s >= 0 in function 'setSize'

terminate called after throwing an instance of 'system_error2::status_error<mmdeploy::StatusDomain>'
  what():  unknown (6) @ /root/workspace/mmdeploy/csrc/mmdeploy/codebase/mmdet/instance_segmentation.cpp:79
Aborted

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions