v0.2.5
What's New
Features
- Presence penalty & min_p sampling: added
presence_penaltyandmin_pas new sampling parameters for finer control over generation behavior. configurable per-model from the admin panel's model settings. (#94)
Bug Fixes
- Metal crash on concurrent add_request: serialized
add_requestcalls through the MLX executor to prevent Metal GPU crashes under concurrent request submission. (#95) - HuggingFace model search broken: removed deprecated
directionparameter fromhuggingface_hub.list_models()that was silently breaking model search results.
Dependencies
- mlx-vlm updated to 348466f: adds support for new VLM model types (MiniCPM-O, Phi-4-reasoning-vision, Phi-4-Multimodal) and includes various bug fixes. oMLX's model discovery and vision input pipeline updated accordingly.
Thanks to @rsnow for reporting the Metal crash issue!