Skip to content

v0.2.3.post2

Choose a tag to compare

@jundot jundot released this 05 Mar 18:41
· 553 commits to main since this release

Hotfix

Bug fixes

  • Fix VLM multi-request blocking: second request now starts immediately instead of waiting for the first to finish
    • Reverted vision encoding to use _mlx_executor instead of asyncio.to_thread() to avoid Metal GPU thread contention (#80, #81)
    • Changed prefill_batch_size default to prevent continuous batching from being disabled when it equaled completion_batch_size
  • Fix segfault when sending concurrent VLM image requests by ensuring all scheduler steps run on the MLX executor thread (#81)
  • Fix missing mcp package crash on server start
  • Fix memory limit UI showing incorrect label when set to 0