Support OpenAI stop sequences in server by eloe · Pull Request #1069 · Blaizzy/mlx-vlm

eloe · 2026-04-25T04:40:34Z

Summary

This adds OpenAI-compatible stop handling to the MLX-VLM server for /chat/completions, plus matching mlx-vlm compatibility support for /responses so both server endpoints honor the same caller intent.

The implementation normalizes stop as either a string or a list of one to four non-empty strings, then incrementally filters decoded text so stop sequences are trimmed even when they span token/chunk boundaries. It is wired through continuous batching, speculative decoding, and non-batching fallback paths. When a server-side stop sequence is matched, generation is cancelled/removed from the active batch and the response reports finish_reason: "stop". When a requested stop sequence is not emitted before generation exhausts, chat completions report finish_reason: "length".

I also fixed a related Responses streaming consistency issue discovered while testing: response.completed.response.output_text now matches the final trimmed response.output_text.done text.

OpenAI spec validation

Validated against the current official OpenAI documented OpenAPI spec at https://app.stainless.com/api/spec/documented/openai/openapi.documented.yml.

Chat Completions defines stop via StopConfiguration: nullable string or array with minItems: 1 and maxItems: 4; returned text must not contain the stop sequence.
Chat completion finish_reason includes stop for a natural stop or provided stop sequence, and length when the token limit is reached.
Streaming chat examples finish with a final chunk whose delta is empty and whose finish_reason is stop.
The current OpenAI CreateResponse schema does not define a top-level stop parameter, so /responses handling here is intentionally treated as mlx-vlm compatibility behavior rather than a claim that OpenAI Responses currently accepts top-level stop.

Validation

Automated tests:

uv run --with pytest python -m pytest mlx_vlm/tests/test_server.py
# 59 passed, 3 warnings

Additional checks:

python3 -m compileall -q mlx_vlm/server.py mlx_vlm/tests/test_server.py
git diff --check

Live worktree integration validation against mlx-community/Qwen3.6-35B-A3B-nvfp4 on the local MLX-VLM server:

Chat non-streaming string stop: passed
Chat streaming string stop: passed
Responses non-streaming string stop: passed
Responses streaming string stop: passed
Chat newline stop: passed
Responses newline stop: passed
No matching stop reaches length: passed
List stop chooses earliest emitted stop: passed
Rejects more than four stop strings: passed
Rejects empty stop string: passed

Integration sweep result: 10 passed, 0 failed.

Broader repo note: pytest -q still fails during collection on an unrelated existing mismatch in mlx_vlm/tests/test_utils.py, which imports get_class_predicate from mlx_vlm.utils although that symbol is not present.

eloe marked this pull request as ready for review April 25, 2026 04:42

eloe marked this pull request as draft April 25, 2026 04:49

eloe force-pushed the codex/issue-1044-stop branch from 1be6bc1 to 740da67 Compare April 25, 2026 05:22

Support OpenAI stop sequences in server

e276f49

eloe force-pushed the codex/issue-1044-stop branch from 740da67 to e276f49 Compare April 25, 2026 05:26

eloe marked this pull request as ready for review April 25, 2026 05:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support OpenAI stop sequences in server#1069

Support OpenAI stop sequences in server#1069
eloe wants to merge 1 commit intoBlaizzy:mainfrom
eloe:codex/issue-1044-stop

eloe commented Apr 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

eloe commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

OpenAI spec validation

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

eloe commented Apr 25, 2026 •

edited

Loading