Releases: cubist38/mlx-openai-server
Releases · cubist38/mlx-openai-server
v1.7.1
What's Changed
- docs: add "Frequently Encountered Problems" section to README by @cubist38 in #263
- refactor: enhance on-demand model management in ModelRegistry by @cubist38 in #265
- docs: add GLM-4.7-Flash-Abliterated-8bit model launch details to README by @cubist38 in #266
- Fix/relax responses api validation for codex by @cubist38 in #268
- feat: enhance inference worker to support cancellation of async tasks by @cubist38 in #270
- feat: enhance handler process with improved garbage collection and pr… by @cubist38 in #271
- Feat/gemma4 by @cubist38 in #272
- refactor: update cache insertion method to use cache_type parameter by @cubist38 in #274
Full Changelog: v1.7.0...v1.7.1
v1.7.0
What's Changed
- refactor: optimize preprocessing pipeline for mlx-lm and mlx-vlm hand… by @cubist38 in #238
- Per model sampling defaults in Yaml config by @lyonsno in #237
- Implement pre-commit and linting GitHub action by @Snuffy2 in #111
- Fix/pre commit by @cubist38 in #246
- Cache/hybrid attention models by @cubist38 in #251
- API compat fallback contract hardening by @lyonsno in #250
- feat: cli defaults parity by @lyonsno in #252
- feat: Streaming Cancellation by @lyonsno in #255
- feat: update mflux installation instructions and add dependency by @cubist38 in #256
- Feat/server config enhancements by @cubist38 in #257
Full Changelog: v1.6.3...v1.7.0
v1.6.3
v1.6.2
What's Changed
- Sync/mflux by @cubist38 in #212
- feat: add Moonshot's partial mode extension by @blightbow in #213
- fix(parsers): resolve test failures after parser refactor by @lyonsno in #214
- fix: handle split reasoning/tool markers in streaming parsers by @lyonsno in #215
- fix(stream): preserve tool_call id stability across deltas by @lyonsno in #216
- feat: add DEFAULT_MIN_P environment variable support for chat completions by @lyonsno in #223
- fix: persist prompt cache when streaming is cancelled by @lyonsno in #225
- fix(parsers): harden function-parameter extraction and streaming opener-tail buffering by @lyonsno in #218
- feat(parsers): add mixed-think reasoning handoff parser and stream re-entry wiring by @lyonsno in #219
- Fix: Prevent reasoning re-injection across APIs by @lyonsno in #224
- fix(parsers): restore step_35 implicit-open compatibility and harden non-stream tool fallback by @lyonsno in #220
- refactor: enhance message handling in MLXLM and MLXVLM handlers by @cubist38 in #227
New Contributors
Full Changelog: v1.6.1...v1.6.2
v1.6.1
v1.6.0
v1.5.3
v1.5.2
What's Changed
- fix(cache): deterministic random seeds and cache leaks by @blightbow in #162
- Fix MiniMax M2 parser failing to parse multi-line HTML content in parameters by @jverkoey in #164
- Remove deprecated parser files for various models including Harmony, … by @cubist38 in #165
- feat(api): add XTC sampling and logit_bias parameters by @blightbow in #167
- fix(handler): enhance unified parser handling in MLXLMHandler by @cubist38 in #170
- Hotfix for embedding models by @icelaglace in #172
- Feat/long cat flash lite by @cubist38 in #175
- Log chat template loading results by @jverkoey in #182
- Feat/kimi k2 by @cubist38 in #184
- feat: make LRU prompt cache size configurable via CLI by @jverkoey in #187
- Refactor/function calling by @cubist38 in #193
New Contributors
- @blightbow made their first contribution in #162
- @jverkoey made their first contribution in #164
Full Changelog: v1.5.1...v1.5.2
v1.5.1
What's Changed
- Feat/glm47 flash by @cubist38 in #151
- Feat/mflux by @cubist38 in #154
- Refactor MLXVLMHandler to remove prompt caching and update parser han… by @cubist38 in #155
- Fix Harmony parser for Open WebUI tool calls by @icelaglace in #156
- Enhance MLXLMHandler to include detailed prompt token usage tracking by @cubist38 in #157
- Hotfix/gpt oss by @cubist38 in #158
- Refactor README.md for clarity and conciseness by @cubist38 in #160
New Contributors
- @icelaglace made their first contribution in #156
Full Changelog: v1.5.0...v1.5.1
v1.5.0
What's Changed
- Linting/Formatting of schemas folder by @Snuffy2 in #106
- Linting/Formatting of middleware folder by @Snuffy2 in #107
- Linting/Formatting of tests folder by @Snuffy2 in #110
- Linting/Formatting of scripts folder by @Snuffy2 in #109
- Add Nemotron3 Nano parsers and update parser registry by @cubist38 in #121
- Refactor MLXFluxHandler to enhance image generation and editing funct… by @cubist38 in #123
- (fix) parser: minimax - tool arguments json by @mialso in #124
- Refactor/server by @cubist38 in #126
- Refactor/server by @cubist38 in #128
- Implement Nemotron3 Nano parsers for reasoning and tool calls by @cubist38 in #129
- Refactor/server by @cubist38 in #130
- Refactor MLX_LM and MLX_VLM to use prompt_cache directly by @cubist38 in #131
- Refactor/server by @cubist38 in #133
- (fix) Qwen3 Coder: tool parser and message converter by @mialso in #136
- Server/enhancement by @cubist38 in #138
- Server/mlx vlm cache by @cubist38 in #139
- Server/enhance parsers by @cubist38 in #142
- Server/enhance parsers by @cubist38 in #143
- Refactor context length handling in model initialization by @cubist38 in #146
New Contributors
Full Changelog: v1.4.2...v1.5.0