Skip to content

ONNX preloaded dlls are incompatible with CUDNN torch version #24266

@lorenzomammana

Description

@lorenzomammana

Describe the issue

Hi,
I'm trying to do something I'm starting to think it is not possible... which is using a CUDNN that is different between torch and onnx.

In the past I was using torch 2.1.2+cu121 and onnxruntime 1.20.0, with this configuration, given the fact that torch was shipping with cudnn8 I had to manually install cudnn 9 to use onnx.
While doing this I've realized that cudnn 9.1 makes a few model much slower compared to cudnn9.6, so I upgraded to 9.6 and everything was working fine.

Now we decided to upgrade the torch version to 2.4.1+cu121 which ships with cudnn 9.1 that is used by onnxruntime making a few of my models effectively slower..

So I'm trying to use the new preload_dlls api to load cudnn 9.6 before importing torch, but when I import torch I get

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\__init__.py", line 137, in <module>
    raise err
OSError: [WinError 127] Impossibile trovare la procedura specificata. Error loading "C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\lib\cudnn_cnn64_9.dll" or one of its dependencies.

This is a list of preloaded dlls before importing torch

C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_adv64_9.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_engines_precompiled64_9.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\cufft64_11.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\cublas64_12.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\cublasLt64_12.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_ops64_9.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_heuristic64_9.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_engines_runtime_compiled64_9.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_graph64_9.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\cudart64_12.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn64_9.dll
C:\Users\orobix\AppData\Local\Programs\Python\Python310\vcruntime140_1.dll
C:\Windows\System32\msvcp140.dll
C:\Windows\System32\msvcp140_1.dll

And after torch

List of loaded DLLs:
C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\lib\cudnn_adv64_9.dll
C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\lib\cublasLt64_12.dll
C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\lib\cublas64_12.dll
C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\lib\nvrtc64_120_0.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_adv64_9.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_engines_precompiled64_9.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\cufft64_11.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\cublas64_12.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\cublasLt64_12.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_ops64_9.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_heuristic64_9.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_engines_runtime_compiled64_9.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_graph64_9.dll
C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\lib\cudart64_12.dll
C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\lib\cudnn64_9.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\cudart64_12.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn64_9.dll
C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\lib\caffe2_nvrtc.dll
C:\Users\orobix\AppData\Local\Programs\Python\Python310\vcruntime140.dll
C:\Windows\System32\msvcp140.dll
C:\Users\orobix\AppData\Local\Programs\Python\Python310\vcruntime140_1.dll
C:\Windows\System32\msvcp140_1.dll

Am I correct to assume that cudnn_cnn64_9.dll is causing errors as it is incompatible? Any clue on why there are other libraries loaded correctly from torch?

I've tried both preloading cuda and cudnn or only cudnn and use torch cuda but the issue is the same

To reproduce

On windows with Cuda 12.6 and CUDNN 9.6.0 installed

onnxruntime.preload_dlls(cuda=True, cudnn=False, directory=CUDA_DIRECTORY)
onnxruntime.preload_dlls(cuda=False, cudnn=True, directory=CUDNN_DIRECTORY)
import torch

Urgency

It heavily slowdowns (like 2 times slower) a few of the models we use

Platform

Windows

OS Version

Windows 10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.21.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 12.6 on Windows, Cuda 12.1 from torch

Metadata

Metadata

Assignees

No one assigned

    Labels

    ep:CUDAissues related to the CUDA execution providerstaleissues that have not been addressed in a while; categorized by a bot

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions