Skip to content

Releases: cubist38/mlx-openai-server

v1.7.1

04 Apr 07:37

Choose a tag to compare

What's Changed

  • docs: add "Frequently Encountered Problems" section to README by @cubist38 in #263
  • refactor: enhance on-demand model management in ModelRegistry by @cubist38 in #265
  • docs: add GLM-4.7-Flash-Abliterated-8bit model launch details to README by @cubist38 in #266
  • Fix/relax responses api validation for codex by @cubist38 in #268
  • feat: enhance inference worker to support cancellation of async tasks by @cubist38 in #270
  • feat: enhance handler process with improved garbage collection and pr… by @cubist38 in #271
  • Feat/gemma4 by @cubist38 in #272
  • refactor: update cache insertion method to use cache_type parameter by @cubist38 in #274

Full Changelog: v1.7.0...v1.7.1

v1.7.0

22 Mar 08:43

Choose a tag to compare

What's Changed

Full Changelog: v1.6.3...v1.7.0

v1.6.3

08 Mar 10:21
f551524

Choose a tag to compare

What's Changed

Full Changelog: v1.6.2...v1.6.3

v1.6.2

06 Mar 17:10

Choose a tag to compare

What's Changed

  • Sync/mflux by @cubist38 in #212
  • feat: add Moonshot's partial mode extension by @blightbow in #213
  • fix(parsers): resolve test failures after parser refactor by @lyonsno in #214
  • fix: handle split reasoning/tool markers in streaming parsers by @lyonsno in #215
  • fix(stream): preserve tool_call id stability across deltas by @lyonsno in #216
  • feat: add DEFAULT_MIN_P environment variable support for chat completions by @lyonsno in #223
  • fix: persist prompt cache when streaming is cancelled by @lyonsno in #225
  • fix(parsers): harden function-parameter extraction and streaming opener-tail buffering by @lyonsno in #218
  • feat(parsers): add mixed-think reasoning handoff parser and stream re-entry wiring by @lyonsno in #219
  • Fix: Prevent reasoning re-injection across APIs by @lyonsno in #224
  • fix(parsers): restore step_35 implicit-open compatibility and harden non-stream tool fallback by @lyonsno in #220
  • refactor: enhance message handling in MLXLM and MLXVLM handlers by @cubist38 in #227

New Contributors

Full Changelog: v1.6.1...v1.6.2

v1.6.1

23 Feb 00:52

Choose a tag to compare

What's Changed

Full Changelog: v1.6.0...v1.6.1

v1.6.0

22 Feb 10:08

Choose a tag to compare

What's Changed

  • fix: preserve assistant messages with tool_calls when content is null by @loveqoo in #205
  • Server/OpenAI compatible api by @cubist38 in #207

New Contributors

Full Changelog: v1.5.3...v1.6.0

v1.5.3

12 Feb 02:29

Choose a tag to compare

What's Changed

Full Changelog: v1.5.2...v1.5.3

v1.5.2

07 Feb 14:02

Choose a tag to compare

What's Changed

  • fix(cache): deterministic random seeds and cache leaks by @blightbow in #162
  • Fix MiniMax M2 parser failing to parse multi-line HTML content in parameters by @jverkoey in #164
  • Remove deprecated parser files for various models including Harmony, … by @cubist38 in #165
  • feat(api): add XTC sampling and logit_bias parameters by @blightbow in #167
  • fix(handler): enhance unified parser handling in MLXLMHandler by @cubist38 in #170
  • Hotfix for embedding models by @icelaglace in #172
  • Feat/long cat flash lite by @cubist38 in #175
  • Log chat template loading results by @jverkoey in #182
  • Feat/kimi k2 by @cubist38 in #184
  • feat: make LRU prompt cache size configurable via CLI by @jverkoey in #187
  • Refactor/function calling by @cubist38 in #193

New Contributors

Full Changelog: v1.5.1...v1.5.2

v1.5.1

27 Jan 03:32

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.5.0...v1.5.1

v1.5.0

15 Jan 03:43

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.4.2...v1.5.0