waybarrios / vllm-mlx Public

Notifications You must be signed in to change notification settings
Fork 161
Star 677

Code
Issues 38
Pull requests 46
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: waybarrios/vllm-mlx

Labels 12 Milestones 0

New pull request New

46 Open 105 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

fix: bump mlx-lm minimum to 0.31.0 for hybrid model batching

#227 opened Mar 25, 2026 by krystophny

Loading…

test: make Python 3.13 async suite pass and cover it in CI

#226 opened Mar 25, 2026 by krystophny

Loading…

fix: MLLM hybrid batching + message normalization

#224 opened Mar 25, 2026 by Thump604

Loading…

feat: MTP per-request routing in BatchedEngine

#223 opened Mar 24, 2026 by Thump604

Loading…

2 of 3 tasks

simple-engine: keep tool chat on the streaming execution path

#222 opened Mar 24, 2026 by krystophny

Loading…

scheduler: preserve prompt checkpoints in chunked prefill resume path

#221 opened Mar 24, 2026 by krystophny

Loading…

engine: keep SimpleEngine serialized across cancellation

#220 opened Mar 24, 2026 by krystophny

Loading…

chat: forward chat_template_kwargs on simple-engine paths

#218 opened Mar 24, 2026 by krystophny

Loading…

prefix_cache: preserve hybrid recurrent state across blocks

#217 opened Mar 24, 2026 by krystophny

Loading…

cli: expose harmony and gpt-oss tool parsers

#216 opened Mar 24, 2026 by krystophny

Loading…

tokenizer: return successful mlx-lm load result

#215 opened Mar 24, 2026 by krystophny

Loading…

server: add OpenAI-compatible /v1/responses endpoint

#214 opened Mar 24, 2026 by krystophny

Loading…

feat: full sampling parameter support (top_k, min_p, presence_penalty, repetition_penalty)

#213 opened Mar 23, 2026 by Thump604

Loading…

5 tasks done

fix: respect tool_choice="none" by excluding tools from template

#210 opened Mar 23, 2026 by awanawana

Loading…

fix: Don’t truncate base64 images before hashing

#206 opened Mar 22, 2026 by BelieveDiffusion

Loading…

feat: add lifecycle-managed residency for the default server model

#205 opened Mar 22, 2026 by lyonsno

Loading…

fix: skip RNN snapshots in MTP optimistic mode to prevent memory leak

#196 opened Mar 21, 2026 by Thump604

Loading…

4 tasks

fix: streaming detokenizer for UTF-8-safe incremental decode

#195 opened Mar 21, 2026 by Thump604

Loading…

5 tasks

Fix MLLM cache stats in /v1/status

#193 opened Mar 21, 2026 by janhilgard

Loading…

4 tasks

fix: rename platform.py to vllm_platform.py to avoid stdlib shadowing

#185 opened Mar 20, 2026 by dan-j-cooper

Loading…

fix: compatibility with mlx-lm 0.31.x (prompt_checkpoints tuple)

#183 opened Mar 20, 2026 by hkstrongside

Loading…

fix: parse tool calls in streaming reasoning branch

#177 opened Mar 18, 2026 by Thump604

Loading…

fix: honor tool_choice=none by stripping tools and suppressing parsing

#173 opened Mar 17, 2026 by Thump604

Loading…

fix: MLLM continuous batching for hybrid models

#165 opened Mar 16, 2026 by Thump604

Loading…

fix: pass size to ArraysCache in BatchMambaCache for Qwen3.5 hybrid models

#160 opened Mar 14, 2026 by neomody77

Loading…

4 tasks done

Previous 1 2 Next

Previous Next

ProTip! Filter pull requests by the default branch with base:main.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!