Skip to content

feat: add stop sequences support for both endpoints#991

Open
eloe wants to merge 3 commits intoBlaizzy:mainfrom
eloe:upstream/stop-sequences
Open

feat: add stop sequences support for both endpoints#991
eloe wants to merge 3 commits intoBlaizzy:mainfrom
eloe:upstream/stop-sequences

Conversation

@eloe
Copy link
Copy Markdown

@eloe eloe commented Apr 9, 2026

Summary

Adds stop parameter support to /v1/chat/completions and /responses, per the OpenAI API spec. Up to 4 stop sequences can be specified as strings; they are passed directly to the generation pipeline as eos_tokens.

  • Accepts stop as a single string or list of strings
  • Validates maximum of 4 sequences (OpenAI limit)
  • Passes strings directly to the eos_tokens kwarg in generate() / stream_generate()

Example

{
  "model": "...",
  "messages": [{"role": "user", "content": "List 3 items"}],
  "stop": ["\n\n", "4."]
}

Tests

  • test_chat_completions_stop_passed_as_eos_tokens
  • test_chat_completions_no_stop_no_eos_tokens
  • test_responses_stop_passed_as_eos_tokens
  • test_resolve_stop_sequences_single_string
  • test_resolve_stop_sequences_list
  • test_resolve_stop_sequences_none
  • test_resolve_stop_sequences_limits_to_four
python -m pytest mlx_vlm/tests/test_server.py -k "stop" -v

eloe and others added 3 commits April 8, 2026 20:31
Accept `stop` parameter (string or list of up to 4 strings) in both
/v1/responses and /v1/chat/completions. Stop strings are tokenized
and passed as `eos_tokens` to the generation pipeline.

New helper: resolve_stop_tokens() converts stop strings to token IDs
using the model's tokenizer.

Adds 7 tests: endpoint integration (responses + chat/completions),
unit tests for resolve_stop_tokens (single, list, None, limit to 4).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
add_eos_token_ids() accepts strings and tokenizes internally.
Renamed resolve_stop_tokens → resolve_stop_sequences, returns
string list directly. Fixes "can only concatenate str (not int)"
error when stop sequences were used.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix comment "token IDs" → "strings" (eos_tokens takes strings)
- Tighten resolve_stop_sequences type hints to List[str]
- Remove unused fake_processor/fake_encode from tests
- Rename misleading test names to drop "string" from name

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant