-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Describe the issue
Hi,
I'm trying to do something I'm starting to think it is not possible... which is using a CUDNN that is different between torch and onnx.
In the past I was using torch 2.1.2+cu121 and onnxruntime 1.20.0, with this configuration, given the fact that torch was shipping with cudnn8 I had to manually install cudnn 9 to use onnx.
While doing this I've realized that cudnn 9.1 makes a few model much slower compared to cudnn9.6, so I upgraded to 9.6 and everything was working fine.
Now we decided to upgrade the torch version to 2.4.1+cu121 which ships with cudnn 9.1 that is used by onnxruntime making a few of my models effectively slower..
So I'm trying to use the new preload_dlls api to load cudnn 9.6 before importing torch, but when I import torch I get
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\__init__.py", line 137, in <module>
raise err
OSError: [WinError 127] Impossibile trovare la procedura specificata. Error loading "C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\lib\cudnn_cnn64_9.dll" or one of its dependencies.
This is a list of preloaded dlls before importing torch
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_adv64_9.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_engines_precompiled64_9.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\cufft64_11.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\cublas64_12.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\cublasLt64_12.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_ops64_9.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_heuristic64_9.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_engines_runtime_compiled64_9.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_graph64_9.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\cudart64_12.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn64_9.dll
C:\Users\orobix\AppData\Local\Programs\Python\Python310\vcruntime140_1.dll
C:\Windows\System32\msvcp140.dll
C:\Windows\System32\msvcp140_1.dll
And after torch
List of loaded DLLs:
C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\lib\cudnn_adv64_9.dll
C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\lib\cublasLt64_12.dll
C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\lib\cublas64_12.dll
C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\lib\nvrtc64_120_0.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_adv64_9.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_engines_precompiled64_9.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\cufft64_11.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\cublas64_12.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\cublasLt64_12.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_ops64_9.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_heuristic64_9.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_engines_runtime_compiled64_9.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_graph64_9.dll
C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\lib\cudart64_12.dll
C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\lib\cudnn64_9.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\cudart64_12.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn64_9.dll
C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\lib\caffe2_nvrtc.dll
C:\Users\orobix\AppData\Local\Programs\Python\Python310\vcruntime140.dll
C:\Windows\System32\msvcp140.dll
C:\Users\orobix\AppData\Local\Programs\Python\Python310\vcruntime140_1.dll
C:\Windows\System32\msvcp140_1.dll
Am I correct to assume that cudnn_cnn64_9.dll is causing errors as it is incompatible? Any clue on why there are other libraries loaded correctly from torch?
I've tried both preloading cuda and cudnn or only cudnn and use torch cuda but the issue is the same
To reproduce
On windows with Cuda 12.6 and CUDNN 9.6.0 installed
onnxruntime.preload_dlls(cuda=True, cudnn=False, directory=CUDA_DIRECTORY)
onnxruntime.preload_dlls(cuda=False, cudnn=True, directory=CUDNN_DIRECTORY)
import torch
Urgency
It heavily slowdowns (like 2 times slower) a few of the models we use
Platform
Windows
OS Version
Windows 10
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.21.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 12.6 on Windows, Cuda 12.1 from torch