Skip to content

fix: CUDA→CPU whisper fallback via Foundry variant API#18

Merged
LeftTwixWand merged 4 commits into
masterfrom
onyx-runtime-fix
Mar 31, 2026
Merged

fix: CUDA→CPU whisper fallback via Foundry variant API#18
LeftTwixWand merged 4 commits into
masterfrom
onyx-runtime-fix

Conversation

@LeftTwixWand
Copy link
Copy Markdown
Contributor

@LeftTwixWand LeftTwixWand commented Mar 31, 2026

Summary

  • Fix broken CUDA→CPU fallback for Whisper transcription when GPU inference fails due to missing cudnn_engines_precompiled64_9.dll in Foundry Local's CUDA EP bundle
  • Use correct Foundry Local API: model.Variants + catalog.GetModelVariantAsync() instead of string-manipulated IDs with catalog.GetModelAsync() (which always resolves to the GPU variant)
  • Add runtime CUDA error detection via exception filter (IsCudaError) — catches cuDNN/CUDA/OnnxRuntimeGenAI errors on first transcription attempt and transparently retries on CPU

Context

Foundry Local's CUDA EP bundle (v12.8.2) is missing cudnn_engines_precompiled64_9.dll, causing all Whisper CUDA variants to fail at inference with CUDNN_BACKEND_API_FAILED on the encoder's FP16 Conv node. Filed upstream as microsoft/Foundry-Local#567.

The previous fallback used catalog.GetModelAsync(cpuId) with string-replaced model IDs — this API only resolves aliases and always returns the GPU variant, so the fallback could never find the CPU model.

Fixes #17

Test plan

  • Verified GPU inference fails reproducibly (all model sizes, same cuDNN error)
  • Verified CPU inference works via catalog.GetModelVariantAsync() + ModelVariant.DownloadAsync/LoadAsync/GetAudioClientAsync
  • aspire start — all resources healthy, Telegram service running with fix
  • Send a voice message via Telegram to confirm end-to-end CPU fallback

🤖 Generated with Claude Code

LeftTwixWand and others added 4 commits March 31, 2026 12:25
The previous fallback used string manipulation on model IDs and
catalog.GetModelAsync, which always resolves to the GPU variant.
Use model.Variants to find the CPU variant and
catalog.GetModelVariantAsync to load it — the correct Foundry Local API
for variant-specific resolution.

Fixes #17
See also microsoft/Foundry-Local#567

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@LeftTwixWand LeftTwixWand self-assigned this Mar 31, 2026
@LeftTwixWand LeftTwixWand merged commit a27ebdd into master Mar 31, 2026
1 check passed
@LeftTwixWand LeftTwixWand deleted the onyx-runtime-fix branch March 31, 2026 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Whisper CUDA inference fails — missing cuDNN DLL in Foundry Local, fallback broken for base model IDs

1 participant