With this release, you can now use newer versions of swift-transformers without having to fork to change the version if it is imported by another dependency (like mlx-swift-examples). We will continue to default to 0.1.8 until we can target Hub and Tokenizers specifically rather than the entire library (work in progress).
This release also contains some improvements to word-level timestamps, which previously did not always filter out special tokens and was slightly out of parity with the original OpenAI implementation. There are also some simple heuristics to extend 0 duration word timings if there is room for the word to shift its starting time back without overlapping. The "forced" prefill token scheme was also changed to allow the model to predict the first timestamp rather than always starting at 0.00 (for cases where the audio begins after a pause in the beginning).
There are also several bugfixes and quality of life improvements throughout, so please test it out and let us know how it goes either here or in the Discord. 🚀
What's Changed
- Ensure that rounded capacity is non-zero in resampleBuffer. by @drewmccormack in #295
- Improve SegmentSeeker word alignment by @ZachNagengast in #305
New Contributors
- @drewmccormack made their first contribution in #295
Full Changelog: v0.10.2...v0.11.0