Skip to content

MLPerf Inference: Errors across ONNX Runtime, PyTorch, and TensorRT Backends #661

@esp-vt

Description

@esp-vt

I'm currently trying to run the MLPerf Inference benchmark suite (v5.0-dev) for the RetinaNet model under various framework backends (ONNX Runtime, PyTorch, TensorRT), but I'm running into critical errors on each.

Below are the details for each backend, with their respective stack traces and my understanding so far.

I'd really appreciate any help, hints, or confirmation on what might be wrong 🙏

ONNX

Coommand:

mlcr run-mlperf,inference,_full,_r5.0-dev \
   --model=retinanet \
   --implementation=reference \
   --framework=onnxruntime \
   --category=datacenter \
   --scenario=Offline \
   --execution_mode=valid \
   --device=cuda

Error:

3.13/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:121: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'AzureExecutionProvider, CPUExecutionProvider'
  warnings.warn(

My Thoughts:
It seems like the onnxruntime package I have might not have been built with GPU support. I installed via pip, but the error still shows up.

PyTorch

Command:

mlcr run-mlperf,inference,_full,_r5.0-dev \
   --model=retinanet \
   --implementation=reference \
   --framework=pytorch \
   --category=datacenter \
   --scenario=Offline \
   --execution_mode=valid \
   --device=cuda

Error:

Traceback (most recent call last):
  File "/home/esp/CM/repos/local/cache/e53f5bcba86f4a9e/inference/vision/classification_and_detection/python/main.py", line 624, in <module>
    main()
    ~~~~^^
  File "/home/esp/CM/repos/local/cache/e53f5bcba86f4a9e/inference/vision/classification_and_detection/python/main.py", line 503, in main
    model = backend.load(args.model, inputs=args.inputs, outputs=args.outputs)
  File "/home/esp/CM/repos/local/cache/e53f5bcba86f4a9e/inference/vision/classification_and_detection/python/backend_pytorch_native.py", line 27, in load
    self.model = torch.load(model_path)
                 ~~~~~~~~~~^^^^^^^^^^^^
  File "/home/esp/CM/repos/local/cache/0c7c8e7dc1564794/mlperf/lib/python3.13/site-packages/torch/serialization.py", line 1495, in load
    raise RuntimeError(
    ...<2 lines>...
    )
RuntimeError: Cannot use ``weights_only=True`` with TorchScript archives passed to ``torch.load``. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.

CM error: Portable CM script failed (name = benchmark-program, return code = 256)

TensorRT

Command:

mlcr run-mlperf,inference,_full,_r5.0-dev \
   --model=retinanet \
   --implementation=nvidia \
   --framework=tensorrt \
   --category=datacenter \
   --scenario=Offline \
   --execution_mode=valid \
   --device=cuda

Error:

tensorrt 5.0
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/MLC/repos/local/cache/get-git-repo_mlperf-inferenc_6db69b29/repo/closed/NVIDIA/code/actionhandler/base.py", line 189, in subprocess_target
    return self.action_handler.handle()
  File "/root/MLC/repos/local/cache/get-git-repo_mlperf-inferenc_6db69b29/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 176, in handle
    total_engine_build_time += self.build_engine(job)
  File "/root/MLC/repos/local/cache/get-git-repo_mlperf-inferenc_6db69b29/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 167, in build_engine
    builder.build_engines()
  File "/usr/local/lib/python3.8/dist-packages/nvmitten/nvidia/builder.py", line 579, in build_engines
    self.mitten_builder.run(self.legacy_scratch, None)
  File "/usr/local/lib/python3.8/dist-packages/nvmitten/debug/debug_manager.py", line 258, in _wrapper
    raise exc_info[1]
  File "/usr/local/lib/python3.8/dist-packages/nvmitten/debug/debug_manager.py", line 245, in _wrapper
    retval = obj(*args, **kwargs)
  File "/root/MLC/repos/local/cache/get-git-repo_mlperf-inferenc_6db69b29/repo/closed/NVIDIA/code/retinanet/tensorrt/Retinanet.py", line 379, in run
    network = self.create_network(self.builder, subnetwork_name=subnet_name)
  File "/usr/local/lib/python3.8/dist-packages/nvmitten/debug/debug_manager.py", line 258, in _wrapper
    raise exc_info[1]
  File "/usr/local/lib/python3.8/dist-packages/nvmitten/debug/debug_manager.py", line 245, in _wrapper
    retval = obj(*args, **kwargs)
  File "/root/MLC/repos/local/cache/get-git-repo_mlperf-inferenc_6db69b29/repo/closed/NVIDIA/code/retinanet/tensorrt/Retinanet.py", line 239, in create_network
    self.apply_subnetwork_io_types(network, subnetwork_name)
  File "/usr/local/lib/python3.8/dist-packages/nvmitten/debug/debug_manager.py", line 258, in _wrapper
    raise exc_info[1]
  File "/usr/local/lib/python3.8/dist-packages/nvmitten/debug/debug_manager.py", line 245, in _wrapper
    retval = obj(*args, **kwargs)
  File "/root/MLC/repos/local/cache/get-git-repo_mlperf-inferenc_6db69b29/repo/closed/NVIDIA/code/retinanet/tensorrt/Retinanet.py", line 289, in apply_subnetwork_io_types
    self._set_tensor_format(tensor_in, use_dla=self.dla_enabled)
  File "/root/MLC/repos/local/cache/get-git-repo_mlperf-inferenc_6db69b29/repo/closed/NVIDIA/code/retinanet/tensorrt/Retinanet.py", line 356, in _set_tensor_format
    tensor.allowed_formats = 1 << int(tensor_format)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

My Thoughts:
The tensor_format seems to be None, which causes int(None) to fail. This likely comes from get_tensor_format() returning None, maybe due to DLA misconfiguration or unsupported tensor types.

System Info

  • OS: Ubuntu 22.04
  • Python: 3.10
  • CUDA: 12.4
  • GPU: H100
  • MLPerf Inference: v5.0-dev
  • Framework:
Name: onnxruntime-gpu
Version: 1.22.0
Summary: ONNX Runtime is a runtime accelerator for Machine Learning mode
ls
Home-page: https://onnxruntime.ai
Author: Microsoft Corporation
Author-email: [email protected]
License: MIT License
Location: /home/esp/mlc/lib/python3.10/site-packages
Requires: coloredlogs, flatbuffers, numpy, packaging, protobuf, sympy
Required-by: 
Name: onnxruntime
Version: 1.22.0
Summary: ONNX Runtime is a runtime accelerator for Machine Learning mode
ls
Home-page: https://onnxruntime.ai
Author: Microsoft Corporation
Author-email: [email protected]
License: MIT License
Location: /home/esp/mlc/lib/python3.10/site-packages
Requires: coloredlogs, flatbuffers, numpy, packaging, protobuf, sympy
Required-by: 
>>> print(torch.__version__)
2.5.1+cu124
>>> print(torch.cuda.is_available())
True
>>> print(torch.version.cuda)
12.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions