Releases · Blaizzy/mlx-audio

26 Aug 18:06

Blaizzy

v0.2.5

cc6bdb4

v0.2.5 Latest

Latest

What's Changed

Use indeterminate progress for CSM models by @lucasnewman in #216
Bump version to 0.2.5 by @Blaizzy in #219

Full Changelog: v0.2.4...v0.2.5

Contributors

Blaizzy and lucasnewman

Assets 2

18 Aug 13:29

Blaizzy

v0.2.4

d494987

v0.2.4

What's Changed

move sentence splitting into a separate utility class and add unit tests by @smdesai in #183
Add BigVGAN neural audio codec by @senstella in #186
fix: (outetts)loading model: Speaker file not found. by @zysam in #189
Fix deprecated save in MLX-LM by @Blaizzy in #194
Implementation of Misaki G2P tokenizer by @smdesai in #193
Add IndexTTS by @senstella in #187
Load both lexicon files us_gold and us_silver with words in us_gold taking precedence by @smdesai in #195
Add S3 semantic tokenizer / neural audio codec by @lucasnewman in #204
add lexicon files for British sounds, gb_gold and gb_silver by @smdesai in #197
Fix Mimi codec by @lucasnewman in #209
Add ability to use a custom URL to load Kokoro safetensors by @adrgrondin in #185
Handle transformers-style config for Sesame CSM models by @lucasnewman in #211
Add Xcode build troubleshooting documentation by @kinkadius in #210
Multi model support by @ivanfioravanti in #213
Add voxtral by @Blaizzy in #214

New Contributors

@zysam made their first contribution in #189
@adrgrondin made their first contribution in #185
@kinkadius made their first contribution in #210

Full Changelog: v0.2.3...v0.2.4

Contributors

ivanfioravanti, smdesai, and 6 other contributors

Assets 2

24 May 15:23

Blaizzy

v0.2.3

1eb879e

v0.2.3

What's Changed

Add custom Voice Cloning OuteTTS by @Blaizzy in #172
Fix Outetts long generation by @Blaizzy in #174
Adding streaming support for Kokoro and an iOS target app by @smdesai in #173
Audio player output buffering for streaming mode by @lucasnewman in #170
Update Swift Package paths by @rudrankriyam in #179
Add Orpheus to MLX-Audio-Swift by @BenLumenDigital in #182
Make KokoroTTSModel properties and methods public for better accessibility by @rudrankriyam in #180
Add streaming support to OuteTTS by @lucasnewman in #169

New Contributors

@smdesai made their first contribution in #173
@rudrankriyam made their first contribution in #179

Full Changelog: v0.2.2...v0.2.3

Contributors

smdesai, Blaizzy, and 3 other contributors

Assets 2

19 May 22:26

Blaizzy

v0.2.2

026e354

v0.2.2

What's Changed

Fixes to Wav2vec2 model by @lucasnewman in #135
Fix Spark mel spec by @lucasnewman in #136
More fixes for the Spark BiCodec module by @lucasnewman in #137
Compatibility with MLX Swift 0.25.2 , add Swift Package by @yarshure in #138
Remove model not found and fix swift GH actions by @Blaizzy in #146
Fix deps by @Blaizzy in #147
Add pull request template by @nlauchande in #150
Add auto sampling rate by @Blaizzy in #148
Add audio utilities, use them where possible by @lucasnewman in #161
Fix join audio sample rate by @Blaizzy in #162
Fix Spark voice matching by @lucasnewman in #163
Fix Python memory spikes by @Blaizzy in #164
Fix Kokoro ISTFT by @lucasnewman in #166
Improvements to Sesame + Mimi streaming playback by @lucasnewman in #167
Add Outtetts v1.0.0 by @Blaizzy in #168
Fix Swift memory spikes by @Blaizzy in #165

New Contributors

@yarshure made their first contribution in #138
@nlauchande made their first contribution in #150

Full Changelog: v0.2.1...v0.2.2

Contributors

yarshure, nlauchande, and 2 other contributors

Assets 2

11 May 07:24

Blaizzy

v0.2.1

f068e80

v0.2.1

What's Changed

Fix SparkTTS Detokenize (TTS) by @Blaizzy in #132

Full Changelog: v0.2.0...v0.2.1

Contributors

Blaizzy

Assets 2

10 May 21:03

Blaizzy

v0.2.0

c685a8e

v0.2.0

What's Changed

Revert utils load and fix deprecate API by @Blaizzy in #98
Remove all remaining torch calls by @lucasnewman in #95
Dia: Split long inputs into individual two-speaker segments by @lucasnewman in #100
Dia: Avoid extra allocations in kv cache by @lucasnewman in #103
Add local version of Whisper for (STT) by @lucasnewman in #105
Add streaming support for Sesame CSM by @lucasnewman in #107
Add default voices for Sesame by @lucasnewman in #109
Fix sesame loading and add mixed_3_4 quantisation by @Blaizzy in #113
Add basic Modular Speech-To-Speech pipeline (CLI) by @lucasnewman in #111
Add Spark-TTS by @Blaizzy in #92
Fix SparkTTS Quant by @Blaizzy in #120
Update spark.py by @Blaizzy in #121
Add Parakeet (STT) by @Blaizzy in #122
Improve Parakeet token merging by @senstella in #129
Add MLX Swift Support and examples by @BenLumenDigital in #84
Add wav2vec2 model (STT) for Spark by @lucasnewman in #131

New Contributors

@senstella made their first contribution in #129
@BenLumenDigital made their first contribution in #84

Full Changelog: v0.1.0...v0.2.0

Contributors

Blaizzy, lucasnewman, and 2 other contributors

Assets 2

26 Apr 12:27

Blaizzy

v0.1.0

77aaefa

v0.1.0

What's Changed

Add support for OuteTTS 1.0 (v3) model by @lucasnewman in #86
Feat: Add quantization and mixed quantization by @Blaizzy in #88
Dia TTS model with voice cloning by @lucasnewman in #91 #93
Handle strict argument in load_weights for better model compatibility by @kamillobinski in #94

New Contributors

@kamillobinski made their first contribution in #94

Full Changelog: v0.0.4...v0.1.0

Contributors

Blaizzy, lucasnewman, and kamillobinski

Assets 2

11 Apr 22:07

Blaizzy

v0.0.4

8669012

v0.0.4

What's Changed

Add CSM (Conversational Speech Model) section to README.md by @ivanfioravanti in #56
Use MLX-based SNAC vocoder for Orpheus by @lucasnewman in #62
Add Descript neural audio codec by @lucasnewman in #57
Vectorize overlap-add operation in istft by @lucasnewman in #66
Align Orpheus sampling parameters with the reference implementation by @lucasnewman in #68
Improve Korkoro generation performance by @lucasnewman in #73
Add Speech to Speech Tab to the UI by @freddyaboulton in #79
Add support for fp16 variant of Sesame by @lucasnewman in #78
Add (partially-working) voice matching support for Orpheus by @lucasnewman in #75
Bible Audiobook example by @andrepadez in #43
removed push notification to ntfy by @andrepadez in #81
Bump version by @Blaizzy in #82

New Contributors

@freddyaboulton made their first contribution in #79

Full Changelog: v0.0.3...v0.0.4

Contributors

andrepadez, ivanfioravanti, and 3 other contributors

Assets 2

21 Mar 23:02

Blaizzy

v0.0.3

bec84ab

v0.0.3

What's Changed

Add verbose logging and model selection support by @ivanfioravanti in #22
Pulsating effect by @ivanfioravanti in #23
Compile the decoder for Kokoro by @lucasnewman in #24
Play audio segments as they are generated by @lucasnewman in #26
Evaluate the computation graph before returning results by @ivanfioravanti in #35
Add Mimi neural audio codec by @lucasnewman in #34
Add model for Sesame TTS by @lucasnewman in #36
Sphere speed up during audio generation by @ivanfioravanti in #40
Added more Voices by @andrepadez in #37
Feature: External API for Audiobook Generation by @sergenes in #19
Add EnCodec neural audio codec by @lucasnewman in #46
Add Suno bark by @Blaizzy in #45
Update README.md to fix lang_code error by @zboyles in #49
fix model config by @Blaizzy in #50
Add Vocos neural audio codec by @lucasnewman in #48
Fix Kokoro audio generation by @lucasnewman in #52
Add orpheus by @Blaizzy in #47
Resample and Transcribe by @chigkim in #51
Fix vocos config loading by @Blaizzy in #53

New Contributors

@andrepadez made their first contribution in #37
@sergenes made their first contribution in #19
@zboyles made their first contribution in #49
@chigkim made their first contribution in #51

Full Changelog: v0.0.2...v0.0.3

Contributors

andrepadez, ivanfioravanti, and 5 other contributors

Assets 2

07 Mar 22:45

Blaizzy

v0.0.2

f24355d

v0.0.2

What's Changed

fix workflows and readme by @Blaizzy in #5
Add soundfile to requirements and Quick Start in README by @ivanfioravanti in #8
Remove librosa dependency by @lucasnewman in #11
Add support for command-line playback with the --play argument by @lucasnewman in #10
Allow receiving text input from stdin or an entry prompt by @lucasnewman in #12
Add web server and improve audio player by @ivanfioravanti in #14
Use phonemizer-fork to avoid espeak errors by @rampadc in #17

New Contributors

@Blaizzy made their first contribution in #5
@ivanfioravanti made their first contribution in #8
@lucasnewman made their first contribution in #11
@rampadc made their first contribution in #17

Full Changelog: v0.0.1...v0.0.2

Contributors

ivanfioravanti, rampadc, and 2 other contributors

Assets 2

Uh oh!

Releases: Blaizzy/mlx-audio

v0.2.5

What's Changed

Contributors

Uh oh!

v0.2.4

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.3

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.2

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.1

What's Changed

Contributors

Uh oh!

v0.2.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.0.4

What's Changed

New Contributors

Contributors

Uh oh!

v0.0.3

What's Changed

New Contributors

Contributors

Uh oh!

v0.0.2

What's Changed

New Contributors

Contributors

Uh oh!