Releases · cubist38/mlx-openai-server · GitHub

04 Apr 07:37

cubist38

v1.7.1 Latest

Latest

What's Changed

docs: add "Frequently Encountered Problems" section to README by @cubist38 in #263
refactor: enhance on-demand model management in ModelRegistry by @cubist38 in #265
docs: add GLM-4.7-Flash-Abliterated-8bit model launch details to README by @cubist38 in #266
Fix/relax responses api validation for codex by @cubist38 in #268
feat: enhance inference worker to support cancellation of async tasks by @cubist38 in #270
feat: enhance handler process with improved garbage collection and pr… by @cubist38 in #271
Feat/gemma4 by @cubist38 in #272
refactor: update cache insertion method to use cache_type parameter by @cubist38 in #274

Full Changelog: v1.7.0...v1.7.1

Contributors

cubist38

Assets 2

22 Mar 08:43

cubist38

v1.7.0

What's Changed

refactor: optimize preprocessing pipeline for mlx-lm and mlx-vlm hand… by @cubist38 in #238
Per model sampling defaults in Yaml config by @lyonsno in #237
Implement pre-commit and linting GitHub action by @Snuffy2 in #111
Fix/pre commit by @cubist38 in #246
Cache/hybrid attention models by @cubist38 in #251
API compat fallback contract hardening by @lyonsno in #250
feat: cli defaults parity by @lyonsno in #252
feat: Streaming Cancellation by @lyonsno in #255
feat: update mflux installation instructions and add dependency by @cubist38 in #256
Feat/server config enhancements by @cubist38 in #257

Full Changelog: v1.6.3...v1.7.0

Contributors

lyonsno, Snuffy2, and cubist38

Assets 2

08 Mar 10:21

cubist38

v1.6.3

What's Changed

feat(parsers): add Qwen3 and Qwen3.5 reasoning parsers by @cubist38 in #229
Refactor/message converters by @cubist38 in #230
Refactor/versioning by @cubist38 in #231
Optimize/prompt cache by @cubist38 in #232
Mflux/dependencies by @cubist38 in #233

Full Changelog: v1.6.2...v1.6.3

Contributors

cubist38

Assets 2

06 Mar 17:10

cubist38

v1.6.2

What's Changed

Sync/mflux by @cubist38 in #212
feat: add Moonshot's partial mode extension by @blightbow in #213
fix(parsers): resolve test failures after parser refactor by @lyonsno in #214
fix: handle split reasoning/tool markers in streaming parsers by @lyonsno in #215
fix(stream): preserve tool_call id stability across deltas by @lyonsno in #216
feat: add DEFAULT_MIN_P environment variable support for chat completions by @lyonsno in #223
fix: persist prompt cache when streaming is cancelled by @lyonsno in #225
fix(parsers): harden function-parameter extraction and streaming opener-tail buffering by @lyonsno in #218
feat(parsers): add mixed-think reasoning handoff parser and stream re-entry wiring by @lyonsno in #219
Fix: Prevent reasoning re-injection across APIs by @lyonsno in #224
fix(parsers): restore step_35 implicit-open compatibility and harden non-stream tool fallback by @lyonsno in #220
refactor: enhance message handling in MLXLM and MLXVLM handlers by @cubist38 in #227

New Contributors

@lyonsno made their first contribution in #214

Full Changelog: v1.6.1...v1.6.2

Contributors

lyonsno, blightbow, and cubist38

Assets 2

23 Feb 00:52

cubist38

v1.6.1

What's Changed

Add openai dependency to pyproject.toml by @cubist38 in #210

Full Changelog: v1.6.0...v1.6.1

Contributors

cubist38

Assets 2

22 Feb 10:08

cubist38

v1.6.0

What's Changed

fix: preserve assistant messages with tool_calls when content is null by @loveqoo in #205
Server/OpenAI compatible api by @cubist38 in #207

New Contributors

@loveqoo made their first contribution in #205

Full Changelog: v1.5.3...v1.6.0

Contributors

loveqoo and cubist38

Assets 2

12 Feb 02:29

cubist38

v1.5.3

What's Changed

Server/non blocking io by @cubist38 in #194
Feat/speculative decoding by @cubist38 in #195
Server/multi handlers by @cubist38 in #197
Fix/mflux by @cubist38 in #199

Full Changelog: v1.5.2...v1.5.3

Contributors

cubist38

Assets 2

07 Feb 14:02

cubist38

v1.5.2

What's Changed

fix(cache): deterministic random seeds and cache leaks by @blightbow in #162
Fix MiniMax M2 parser failing to parse multi-line HTML content in parameters by @jverkoey in #164
Remove deprecated parser files for various models including Harmony, … by @cubist38 in #165
feat(api): add XTC sampling and logit_bias parameters by @blightbow in #167
fix(handler): enhance unified parser handling in MLXLMHandler by @cubist38 in #170
Hotfix for embedding models by @icelaglace in #172
Feat/long cat flash lite by @cubist38 in #175
Log chat template loading results by @jverkoey in #182
Feat/kimi k2 by @cubist38 in #184
feat: make LRU prompt cache size configurable via CLI by @jverkoey in #187
Refactor/function calling by @cubist38 in #193

New Contributors

@blightbow made their first contribution in #162
@jverkoey made their first contribution in #164

Full Changelog: v1.5.1...v1.5.2

Contributors

jverkoey, blightbow, and 2 other contributors

Assets 2

27 Jan 03:32

cubist38

v1.5.1

What's Changed

Feat/glm47 flash by @cubist38 in #151
Feat/mflux by @cubist38 in #154
Refactor MLXVLMHandler to remove prompt caching and update parser han… by @cubist38 in #155
Fix Harmony parser for Open WebUI tool calls by @icelaglace in #156
Enhance MLXLMHandler to include detailed prompt token usage tracking by @cubist38 in #157
Hotfix/gpt oss by @cubist38 in #158
Refactor README.md for clarity and conciseness by @cubist38 in #160

New Contributors

@icelaglace made their first contribution in #156

Full Changelog: v1.5.0...v1.5.1

Contributors

icelaglace and cubist38

Assets 2

15 Jan 03:43

cubist38

v1.5.0

What's Changed

Linting/Formatting of schemas folder by @Snuffy2 in #106
Linting/Formatting of middleware folder by @Snuffy2 in #107
Linting/Formatting of tests folder by @Snuffy2 in #110
Linting/Formatting of scripts folder by @Snuffy2 in #109
Add Nemotron3 Nano parsers and update parser registry by @cubist38 in #121
Refactor MLXFluxHandler to enhance image generation and editing funct… by @cubist38 in #123
(fix) parser: minimax - tool arguments json by @mialso in #124
Refactor/server by @cubist38 in #126
Refactor/server by @cubist38 in #128
Implement Nemotron3 Nano parsers for reasoning and tool calls by @cubist38 in #129
Refactor/server by @cubist38 in #130
Refactor MLX_LM and MLX_VLM to use prompt_cache directly by @cubist38 in #131
Refactor/server by @cubist38 in #133
(fix) Qwen3 Coder: tool parser and message converter by @mialso in #136
Server/enhancement by @cubist38 in #138
Server/mlx vlm cache by @cubist38 in #139
Server/enhance parsers by @cubist38 in #142
Server/enhance parsers by @cubist38 in #143
Refactor context length handling in model initialization by @cubist38 in #146

New Contributors

@mialso made their first contribution in #124

Full Changelog: v1.4.2...v1.5.0

Contributors

Snuffy2, mialso, and cubist38

Assets 2