Summary
IBM's Granite-4.0-1B-Speech (~2B params, Apache 2.0) achieves state-of-the-art English ASR with 5.52 WER on the OpenASR leaderboard — roughly 2 points better than Whisper Large V3 (~7.4 WER). It supports 7 languages including Portuguese, and runs at 280x real-time on GPU.
This model is worth evaluating as a future alternative or complement to our current Whisper.cpp-based transcription pipeline.
Current Blockers
The following requirements must be met before integration is practical:
When to Revisit
This issue should be revisited if any of the following occur:
- IBM or the community releases a lightweight C/C++ inference runtime
- Streaming/chunked inference support is added to the model
- A Rust crate wrapping Granite Speech inference becomes available
- Meetily's architecture changes to support a server-side transcription backend (where Python/vLLM would be acceptable)
Key Comparisons
| Factor |
Granite 4.0 1B Speech |
Whisper.cpp (current) |
| English WER |
5.52 |
~7.4 (Large V3) |
| Languages |
7 |
99+ |
| Native C/C++ runtime |
None |
Yes |
| Streaming support |
No |
Yes |
| Memory (~fp16) |
~4 GB |
~1.5 GB (Large V3) |
| License |
Apache 2.0 |
MIT |
References
Summary
IBM's Granite-4.0-1B-Speech (~2B params, Apache 2.0) achieves state-of-the-art English ASR with 5.52 WER on the OpenASR leaderboard — roughly 2 points better than Whisper Large V3 (~7.4 WER). It supports 7 languages including Portuguese, and runs at 280x real-time on GPU.
This model is worth evaluating as a future alternative or complement to our current Whisper.cpp-based transcription pipeline.
Current Blockers
The following requirements must be met before integration is practical:
ortRust crate, which is significantly more complex than our current whisper-rs setup.When to Revisit
This issue should be revisited if any of the following occur:
Key Comparisons
References