Description
Whisper model openai-whisper-large-v3-turbo-cuda-gpu:2 loads successfully but fails at inference time with CUDNN_BACKEND_API_FAILED because cudnn_engines_precompiled64_9.dll is missing from the CUDA execution provider bundle.
Environment
- Foundry Local: latest (installed via winget)
- GPU: NVIDIA GeForce RTX 2060 (6 GB VRAM)
- Driver: 595.97, CUDA 13.2
- OS: Windows 11 Pro
- CUDA EP version:
onnxruntime-foundry-win-x64-cuda-deps 12.8.2
Reproduction
foundry model download whisper-large-v3-turbo
foundry model load whisper-large-v3-turbo
# Then transcribe any WAV file via the C# SDK:
var catalog = await FoundryLocalManager.Instance.GetCatalogAsync();
var model = await catalog.GetModelAsync("whisper-large-v3-turbo");
await model.DownloadAsync();
await model.LoadAsync();
var client = await model.GetAudioClientAsync();
await foreach (var chunk in client.TranscribeAudioStreamingAsync("test.wav", ct))
Console.Write(chunk.Text);
Observed behavior
Console warning:
Could not locate cudnn_engines_precompiled64_9.dll. Please make sure it is in your library path!
Followed by exception:
Microsoft.ML.OnnxRuntimeGenAI.OnnxRuntimeGenAIException: Non-zero status code returned while running Conv node.
Name:'/encoder/encoder/conv1/Conv'
Status Message: Failed to initialize CUDNN Frontend
CUDNN_FE failure 11: CUDNN_BACKEND_API_FAILED
cuDNN frontend JSON shows FP16 convolution failing to build its operation graph.
Root cause
The CUDA EP directory (~/.iaw/ep/cuda-ep/) contains:
- ✅
cudnn64_9.dll
- ✅
cudnn_graph64_9.dll
- ✅
cudnn_ops64_9.dll
- ❌
cudnn_engines_precompiled64_9.dll — missing
The version.json shows cuda_binaries source is onnxruntime-foundry-win-x64-cuda-deps version 12.8.2. This bundle does not include the precompiled engines DLL that cuDNN 9.8.0's frontend API requires to build FP16 convolution operation graphs.
Expected behavior
Either:
- Include
cudnn_engines_precompiled64_9.dll in the CUDA EP bundle, or
- Fall back to a non-cuDNN-frontend code path when the DLL is missing
Workaround
Use the CPU variant (openai-whisper-large-v3-turbo-generic-cpu:2) instead.
Description
Whisper model
openai-whisper-large-v3-turbo-cuda-gpu:2loads successfully but fails at inference time withCUDNN_BACKEND_API_FAILEDbecausecudnn_engines_precompiled64_9.dllis missing from the CUDA execution provider bundle.Environment
onnxruntime-foundry-win-x64-cuda-deps12.8.2Reproduction
foundry model download whisper-large-v3-turbo foundry model load whisper-large-v3-turbo # Then transcribe any WAV file via the C# SDK:Observed behavior
Console warning:
Followed by exception:
cuDNN frontend JSON shows FP16 convolution failing to build its operation graph.
Root cause
The CUDA EP directory (
~/.iaw/ep/cuda-ep/) contains:cudnn64_9.dllcudnn_graph64_9.dllcudnn_ops64_9.dllcudnn_engines_precompiled64_9.dll— missingThe
version.jsonshowscuda_binariessource isonnxruntime-foundry-win-x64-cuda-depsversion12.8.2. This bundle does not include the precompiled engines DLL that cuDNN 9.8.0's frontend API requires to build FP16 convolution operation graphs.Expected behavior
Either:
cudnn_engines_precompiled64_9.dllin the CUDA EP bundle, orWorkaround
Use the CPU variant (
openai-whisper-large-v3-turbo-generic-cpu:2) instead.