ONNX preloaded dlls are incompatible with CUDNN torch version

### Describe the issue

Hi,
I'm trying to do something I'm starting to think it is not possible... which is using a CUDNN that is different between torch and onnx.

In the past I was using torch 2.1.2+cu121 and onnxruntime 1.20.0, with this configuration, given the fact that torch was shipping with cudnn8 I had to manually install cudnn 9 to use onnx. 
While doing this I've realized that cudnn 9.1 makes a few model much slower compared to cudnn9.6, so I upgraded to 9.6 and everything was working fine.

Now we decided to upgrade the torch version to 2.4.1+cu121 which ships with cudnn 9.1 that is used by onnxruntime making a few of my models effectively slower..

So I'm trying to use the new preload_dlls api to load cudnn 9.6 before importing torch, but when I import torch I get

```
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\__init__.py", line 137, in <module>
    raise err
OSError: [WinError 127] Impossibile trovare la procedura specificata. Error loading "C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\lib\cudnn_cnn64_9.dll" or one of its dependencies.
```

This is a list of preloaded dlls before importing torch

```
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_adv64_9.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_engines_precompiled64_9.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\cufft64_11.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\cublas64_12.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\cublasLt64_12.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_ops64_9.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_heuristic64_9.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_engines_runtime_compiled64_9.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_graph64_9.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\cudart64_12.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn64_9.dll
C:\Users\orobix\AppData\Local\Programs\Python\Python310\vcruntime140_1.dll
C:\Windows\System32\msvcp140.dll
C:\Windows\System32\msvcp140_1.dll
```

And after torch
```
List of loaded DLLs:
C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\lib\cudnn_adv64_9.dll
C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\lib\cublasLt64_12.dll
C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\lib\cublas64_12.dll
C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\lib\nvrtc64_120_0.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_adv64_9.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_engines_precompiled64_9.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\cufft64_11.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\cublas64_12.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\cublasLt64_12.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_ops64_9.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_heuristic64_9.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_engines_runtime_compiled64_9.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn_graph64_9.dll
C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\lib\cudart64_12.dll
C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\lib\cudnn64_9.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\cudart64_12.dll
C:\Program Files\NVIDIA\CUDNN\v9.6\bin\12.6\cudnn64_9.dll
C:\Users\orobix\Desktop\Axon-0.32.0\python\axon-default-ns\torch\lib\caffe2_nvrtc.dll
C:\Users\orobix\AppData\Local\Programs\Python\Python310\vcruntime140.dll
C:\Windows\System32\msvcp140.dll
C:\Users\orobix\AppData\Local\Programs\Python\Python310\vcruntime140_1.dll
C:\Windows\System32\msvcp140_1.dll
```

Am I correct to assume that cudnn_cnn64_9.dll is causing errors as it is incompatible? Any clue on why there are other libraries loaded correctly from torch?

I've tried both preloading cuda and cudnn or only cudnn and use torch cuda but the issue is the same

### To reproduce

On windows with Cuda 12.6 and CUDNN 9.6.0 installed

```
onnxruntime.preload_dlls(cuda=True, cudnn=False, directory=CUDA_DIRECTORY)
onnxruntime.preload_dlls(cuda=False, cudnn=True, directory=CUDNN_DIRECTORY)
import torch
```

### Urgency

It heavily slowdowns (like 2 times slower) a few of the models we use

### Platform

Windows

### OS Version

Windows 10

### ONNX Runtime Installation

Released Package

### ONNX Runtime Version or Commit ID

1.21.0

### ONNX Runtime API

Python

### Architecture

X64

### Execution Provider

CUDA

### Execution Provider Library Version

CUDA 12.6 on Windows, Cuda 12.1 from torch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNX preloaded dlls are incompatible with CUDNN torch version #24266

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ONNX preloaded dlls are incompatible with CUDNN torch version #24266

Description

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions