fix: CUDA→CPU whisper fallback via Foundry variant API by LeftTwixWand · Pull Request #18 · InteractiveAgents/IAW

LeftTwixWand · 2026-03-31T11:44:05Z

Summary

Fix broken CUDA→CPU fallback for Whisper transcription when GPU inference fails due to missing cudnn_engines_precompiled64_9.dll in Foundry Local's CUDA EP bundle
Use correct Foundry Local API: model.Variants + catalog.GetModelVariantAsync() instead of string-manipulated IDs with catalog.GetModelAsync() (which always resolves to the GPU variant)
Add runtime CUDA error detection via exception filter (IsCudaError) — catches cuDNN/CUDA/OnnxRuntimeGenAI errors on first transcription attempt and transparently retries on CPU

Context

Foundry Local's CUDA EP bundle (v12.8.2) is missing cudnn_engines_precompiled64_9.dll, causing all Whisper CUDA variants to fail at inference with CUDNN_BACKEND_API_FAILED on the encoder's FP16 Conv node. Filed upstream as microsoft/Foundry-Local#567.

The previous fallback used catalog.GetModelAsync(cpuId) with string-replaced model IDs — this API only resolves aliases and always returns the GPU variant, so the fallback could never find the CPU model.

Fixes #17

Test plan

Verified GPU inference fails reproducibly (all model sizes, same cuDNN error)
Verified CPU inference works via catalog.GetModelVariantAsync() + ModelVariant.DownloadAsync/LoadAsync/GetAudioClientAsync
aspire start — all resources healthy, Telegram service running with fix
Send a voice message via Telegram to confirm end-to-end CPU fallback

🤖 Generated with Claude Code

The previous fallback used string manipulation on model IDs and catalog.GetModelAsync, which always resolves to the GPU variant. Use model.Variants to find the CPU variant and catalog.GetModelVariantAsync to load it — the correct Foundry Local API for variant-specific resolution. Fixes #17 See also microsoft/Foundry-Local#567 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

LeftTwixWand and others added 4 commits March 31, 2026 12:25

Update settings.local.json

96324c9

Update FoundryLocalTranscriptionService.cs

2e48efd

Update AppHost.cs

aac49a6

LeftTwixWand self-assigned this Mar 31, 2026

LeftTwixWand merged commit a27ebdd into master Mar 31, 2026
1 check passed

LeftTwixWand deleted the onyx-runtime-fix branch March 31, 2026 12:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: CUDA→CPU whisper fallback via Foundry variant API#18

fix: CUDA→CPU whisper fallback via Foundry variant API#18
LeftTwixWand merged 4 commits into
masterfrom
onyx-runtime-fix

LeftTwixWand commented Mar 31, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LeftTwixWand commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LeftTwixWand commented Mar 31, 2026 •

edited

Loading