-
Notifications
You must be signed in to change notification settings - Fork 178
Add NPU support for whisper.cpp on Lemonade #956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@ramkrishna2910 Great feedback on catching partial downloads and model search naming. There is this one pending test that is never triggered (because I changed/removed the test to separate cpu and npu tests) |
@iswaryaalex the test name is encoded into github settings, not the git codebase itself. When we are about to merge this PR I will change the github settings to point to your new tests instead. |
…m/lemonade-sdk/lemonade into iswarya/whisper-cpp-multi-backend
|
I am getting this error during download of rai cache |
Can you check your server logs for me ? This is how it should be - Mostly the cache didn't download ? It should get logged if that was the case |
ramkrishna2910
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested! Works well!


Core Update
This PR adds whispercpp NPU support to lemonade server, leveraging key updates from Ryzen AI 1.7
lemonade-sdk/whisper.cpp-npu, CPU usesggml-org/whisper.cpp) and so on. Extendable to other backends.raicache files from HuggingFace AMD repo https://huggingface.co/collections/amd/ryzen-ai-17-whisper-npu-optimized-onnx-models and places them alongside model checkpoints for NPU runtimebackend_versions.jsonto support per-backend versioningTo Test this PR
Start Lemonade server with whispercpp NPU backend
.\build\Release\lemonade-server.exe serveDownload sample .wav :
curl -o test.wav "https://raw.githubusercontent.com/lemonade-sdk/assets/main/audio/test_speech.wav"Load NPU Whisper model
curl -X POST http://localhost:8000/api/v1/load -H "Content-Type: application/json" -d "{\"model_name\": \"Whisper-Tiny\", \"whispercpp_backend\": \"npu\"}"To test with
Whisper-Tinycurl -X POST http://localhost:8000/api/v1/audio/transcriptions -F "[email protected]" -F "model=Whisper-Tiny"You can also test with other Whisper models from here: