Releases: Blaizzy/mlx-audio
Releases · Blaizzy/mlx-audio
v0.2.5
v0.2.4
What's Changed
- move sentence splitting into a separate utility class and add unit tests by @smdesai in #183
 - Add BigVGAN neural audio codec by @senstella in #186
 - fix: (outetts)loading model: Speaker file not found. by @zysam in #189
 - Fix deprecated save in MLX-LM by @Blaizzy in #194
 - Implementation of Misaki G2P tokenizer by @smdesai in #193
 - Add IndexTTS by @senstella in #187
 - Load both lexicon files us_gold and us_silver with words in us_gold taking precedence by @smdesai in #195
 - Add S3 semantic tokenizer / neural audio codec by @lucasnewman in #204
 - add lexicon files for British sounds, gb_gold and gb_silver by @smdesai in #197
 - Fix Mimi codec by @lucasnewman in #209
 - Add ability to use a custom URL to load Kokoro safetensors by @adrgrondin in #185
 - Handle transformers-style config for Sesame CSM models by @lucasnewman in #211
 - Add Xcode build troubleshooting documentation by @kinkadius in #210
 - Multi model support by @ivanfioravanti in #213
 - Add voxtral by @Blaizzy in #214
 
New Contributors
- @zysam made their first contribution in #189
 - @adrgrondin made their first contribution in #185
 - @kinkadius made their first contribution in #210
 
Full Changelog: v0.2.3...v0.2.4
v0.2.3
What's Changed
- Add custom Voice Cloning OuteTTS by @Blaizzy in #172
 - Fix Outetts long generation by @Blaizzy in #174
 - Adding streaming support for Kokoro and an iOS target app by @smdesai in #173
 - Audio player output buffering for streaming mode by @lucasnewman in #170
 - Update Swift Package paths by @rudrankriyam in #179
 - Add Orpheus to MLX-Audio-Swift by @BenLumenDigital in #182
 - Make KokoroTTSModel properties and methods public for better accessibility by @rudrankriyam in #180
 - Add streaming support to OuteTTS by @lucasnewman in #169
 
New Contributors
- @smdesai made their first contribution in #173
 - @rudrankriyam made their first contribution in #179
 
Full Changelog: v0.2.2...v0.2.3
v0.2.2
What's Changed
- Fixes to Wav2vec2 model by @lucasnewman in #135
 - Fix Spark mel spec by @lucasnewman in #136
 - More fixes for the Spark BiCodec module by @lucasnewman in #137
 - Compatibility with MLX Swift 0.25.2 , add Swift Package by @yarshure in #138
 - Remove model not found and fix swift GH actions by @Blaizzy in #146
 - Fix deps by @Blaizzy in #147
 - Add pull request template by @nlauchande in #150
 - Add auto sampling rate by @Blaizzy in #148
 - Add audio utilities, use them where possible by @lucasnewman in #161
 - Fix join audio sample rate by @Blaizzy in #162
 - Fix Spark voice matching by @lucasnewman in #163
 - Fix Python memory spikes by @Blaizzy in #164
 - Fix Kokoro ISTFT by @lucasnewman in #166
 - Improvements to Sesame + Mimi streaming playback by @lucasnewman in #167
 - Add Outtetts v1.0.0 by @Blaizzy in #168
 - Fix Swift memory spikes by @Blaizzy in #165
 
New Contributors
- @yarshure made their first contribution in #138
 - @nlauchande made their first contribution in #150
 
Full Changelog: v0.2.1...v0.2.2
v0.2.1
v0.2.0
What's Changed
- Revert utils load and fix deprecate API by @Blaizzy in #98
 - Remove all remaining torch calls by @lucasnewman in #95
 - Dia: Split long inputs into individual two-speaker segments by @lucasnewman in #100
 - Dia: Avoid extra allocations in kv cache by @lucasnewman in #103
 - Add local version of Whisper for (STT) by @lucasnewman in #105
 - Add streaming support for Sesame CSM by @lucasnewman in #107
 - Add default voices for Sesame by @lucasnewman in #109
 - Fix sesame loading and add mixed_3_4 quantisation by @Blaizzy in #113
 - Add basic Modular Speech-To-Speech pipeline (CLI) by @lucasnewman in #111
 - Add Spark-TTS by @Blaizzy in #92
 - Fix SparkTTS Quant by @Blaizzy in #120
 - Update spark.py by @Blaizzy in #121
 - Add Parakeet (STT) by @Blaizzy in #122
 - Improve Parakeet token merging by @senstella in #129
 - Add MLX Swift Support and examples by @BenLumenDigital in #84
 - Add wav2vec2 model (STT) for Spark by @lucasnewman in #131
 
New Contributors
- @senstella made their first contribution in #129
 - @BenLumenDigital made their first contribution in #84
 
Full Changelog: v0.1.0...v0.2.0
v0.1.0
What's Changed
- Add support for OuteTTS 1.0 (v3) model by @lucasnewman in #86
 - Feat: Add quantization and mixed quantization by @Blaizzy in #88
 - Dia TTS model with voice cloning by @lucasnewman in #91 #93
 - Handle strict argument in load_weights for better model compatibility by @kamillobinski in #94
 
New Contributors
- @kamillobinski made their first contribution in #94
 
Full Changelog: v0.0.4...v0.1.0
v0.0.4
What's Changed
- Add CSM (Conversational Speech Model) section to README.md by @ivanfioravanti in #56
 - Use MLX-based SNAC vocoder for Orpheus by @lucasnewman in #62
 - Add Descript neural audio codec by @lucasnewman in #57
 - Vectorize overlap-add operation in istft by @lucasnewman in #66
 - Align Orpheus sampling parameters with the reference implementation by @lucasnewman in #68
 - Improve Korkoro generation performance by @lucasnewman in #73
 - Add Speech to Speech Tab to the UI by @freddyaboulton in #79
 - Add support for fp16 variant of Sesame by @lucasnewman in #78
 - Add (partially-working) voice matching support for Orpheus by @lucasnewman in #75
 - Bible Audiobook example by @andrepadez in #43
 - removed push notification to ntfy by @andrepadez in #81
 - Bump version by @Blaizzy in #82
 
New Contributors
- @freddyaboulton made their first contribution in #79
 
Full Changelog: v0.0.3...v0.0.4
v0.0.3
What's Changed
- Add verbose logging and model selection support by @ivanfioravanti in #22
 - Pulsating effect by @ivanfioravanti in #23
 - Compile the decoder for Kokoro by @lucasnewman in #24
 - Play audio segments as they are generated by @lucasnewman in #26
 - Evaluate the computation graph before returning results by @ivanfioravanti in #35
 - Add Mimi neural audio codec by @lucasnewman in #34
 - Add model for Sesame TTS by @lucasnewman in #36
 - Sphere speed up during audio generation by @ivanfioravanti in #40
 - Added more Voices by @andrepadez in #37
 - Feature: External API for Audiobook Generation by @sergenes in #19
 - Add EnCodec neural audio codec by @lucasnewman in #46
 - Add Suno bark by @Blaizzy in #45
 - Update README.md to fix lang_code error by @zboyles in #49
 - fix model config by @Blaizzy in #50
 - Add Vocos neural audio codec by @lucasnewman in #48
 - Fix Kokoro audio generation by @lucasnewman in #52
 - Add orpheus by @Blaizzy in #47
 - Resample and Transcribe by @chigkim in #51
 - Fix vocos config loading by @Blaizzy in #53
 
New Contributors
- @andrepadez made their first contribution in #37
 - @sergenes made their first contribution in #19
 - @zboyles made their first contribution in #49
 - @chigkim made their first contribution in #51
 
Full Changelog: v0.0.2...v0.0.3
v0.0.2
What's Changed
- fix workflows and readme by @Blaizzy in #5
 - Add soundfile to requirements and Quick Start in README by @ivanfioravanti in #8
 - Remove librosa dependency by @lucasnewman in #11
 - Add support for command-line playback with the --play argument by @lucasnewman in #10
 - Allow receiving text input from stdin or an entry prompt by @lucasnewman in #12
 - Add web server and improve audio player by @ivanfioravanti in #14
 - Use phonemizer-fork to avoid espeak errors by @rampadc in #17
 
New Contributors
- @Blaizzy made their first contribution in #5
 - @ivanfioravanti made their first contribution in #8
 - @lucasnewman made their first contribution in #11
 - @rampadc made their first contribution in #17
 
Full Changelog: v0.0.1...v0.0.2