Skip to content

Prefer DirectML for Windows ONNX transcription models#985

Draft
ferologics wants to merge 2 commits intocjpais:mainfrom
ferologics:feat/windows-onnx-directml
Draft

Prefer DirectML for Windows ONNX transcription models#985
ferologics wants to merge 2 commits intocjpais:mainfrom
ferologics:feat/windows-onnx-directml

Conversation

@ferologics
Copy link
Contributor

@ferologics ferologics commented Mar 9, 2026

Summary

  • patch Handy's transcribe-rs dependency to a forked git revision with Windows DirectML support for ONNX models
  • prefer DirectMLExecutionProvider on Windows, with explicit CPU fallback if provider registration fails
  • log whether DirectML registration succeeded or fell back to CPU
  • clean a few existing Rust warnings touched during validation

Validation

  • cargo check
  • cargo check --release
  • launched the built Handy app locally and confirmed handy.log shows successful DirectML registration for the Parakeet ONNX sessions
  • ran a local dev-build transcription successfully after the change
  • benchmarked the patched Parakeet path on a real Handy recording (244.38s audio): 35.368s before vs 6.99s after (~5.1x faster, ~35x realtime)

Dependency patch

@ferologics ferologics force-pushed the feat/windows-onnx-directml branch 2 times, most recently from 3b27bd4 to 32b0150 Compare March 9, 2026 23:40
@github-actions
Copy link

🧪 Test Build Ready

Build artifacts for PR #985 are available for testing.

Download artifacts from workflow run

Artifacts expire after 30 days.

@cjpais
Copy link
Owner

cjpais commented Mar 10, 2026

@ferologics can you see if this helps the inference speed for you still, would be quite curious if it just goes back to CPU or works out the box. I kind of think since it's direct ML it might just work on Win 11, would be curious about Win10 too

@ferologics
Copy link
Contributor Author

Tested the CI-built Windows artifact locally and it looks good.

What I checked:

  • downloaded handy-pr-985-x86_64-pc-windows-msvc from run 22882036165
  • extracted the MSI payload and confirmed the packaged app includes DirectML.dll
  • launched the packaged handy.exe
  • triggered a real start/stop transcription cycle against my normal Handy setup

Result:

  • handy.log shows successful DirectML registration in the packaged build:
    • ONNX Runtime session registered DirectMLExecutionProvider on Windows (device 0) with CPU fallback enabled
  • I saw that log for the Parakeet encoder / decoder / nemo sessions
  • I did not see the CPU fallback warning

So on my Windows 11 machine the CI-built artifact is still taking the intended GPU path, not silently dropping back to CPU.

@cjpais
Copy link
Owner

cjpais commented Mar 10, 2026

Solid. This is amazing news. I will test on my windows machine when I can and see how it goes as well. I'm curious how this will play with integrated GPUs

I am slightly wondering if we will need to provide an option to disable this. Just in case CPU is faster for someone. I know another PR in transcribe rs had something like this. Might be worth considering

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants