Skip to content

[serve][llm][sglang] Chat tokenize ignores user's add_generation_prompt setting #61687

@eureka928

Description

@eureka928

Context

In SGLangServer.tokenize(), when handling a TokenizeChatRequest, the method delegates to _render_chat_prompt() which hardcodes add_generation_prompt=True in the apply_chat_template call.

The vLLM TokenizeChatRequest exposes add_generation_prompt as a user-configurable field, but reusing _render_chat_prompt silently ignores this setting. For tokenization (unlike chat completions), users often need to control whether the generation prompt is included in the token count.

Relevant code

  • tokenize() at python/ray/llm/examples/sglang/modules/sglang_engine.py (lines 392-394)
  • _render_chat_prompt() at the same file (lines 140-145)

Suggested fix

Pass add_generation_prompt from the request to _render_chat_prompt (or call apply_chat_template directly in tokenize() with the request's field value) instead of hardcoding True.

References

Metadata

Metadata

Assignees

Type

No type

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions