v0.2.3.post2

jundot released this 05 Mar 18:41

· 553 commits to main since this release

be81f22

Hotfix

Bug fixes

Fix VLM multi-request blocking: second request now starts immediately instead of waiting for the first to finish
- Reverted vision encoding to use _mlx_executor instead of asyncio.to_thread() to avoid Metal GPU thread contention (#80, #81)
- Changed prefill_batch_size default to prevent continuous batching from being disabled when it equaled completion_batch_size
Fix segfault when sending concurrent VLM image requests by ensuring all scheduler steps run on the MLX executor thread (#81)
Fix missing mcp package crash on server start
Fix memory limit UI showing incorrect label when set to 0

Assets 3