-
Notifications
You must be signed in to change notification settings - Fork 7.3k
Open
Labels
Description
Context
In SGLangServer.tokenize(), when handling a TokenizeChatRequest, the method delegates to _render_chat_prompt() which hardcodes add_generation_prompt=True in the apply_chat_template call.
The vLLM TokenizeChatRequest exposes add_generation_prompt as a user-configurable field, but reusing _render_chat_prompt silently ignores this setting. For tokenization (unlike chat completions), users often need to control whether the generation prompt is included in the token count.
Relevant code
tokenize()atpython/ray/llm/examples/sglang/modules/sglang_engine.py(lines 392-394)_render_chat_prompt()at the same file (lines 140-145)
Suggested fix
Pass add_generation_prompt from the request to _render_chat_prompt (or call apply_chat_template directly in tokenize() with the request's field value) instead of hardcoding True.
References
- Found by Cursor Bugbot review on PR [serve][llm] Add tokenize/detokenize to SGLang example engine #61446
- Reviewer @eicherseiji requested issue creation: [serve][llm] Add tokenize/detokenize to SGLang example engine #61446 (comment)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
In progress