Skip to content

[TITO] model-support: add DeepSeek V4 TITO support #1065

Open
zyzshishui wants to merge 3 commits intoradixark:mainfrom
zyzshishui:v4-tito
Open

[TITO] model-support: add DeepSeek V4 TITO support #1065
zyzshishui wants to merge 3 commits intoradixark:mainfrom
zyzshishui:v4-tito

Conversation

@zyzshishui
Copy link
Copy Markdown
Contributor

Summary

Adds TITO support for DeepSeek V4 (tested on Flash due to lack of gpu, but should also work for V4 pro).

Registers a new deepseekv4 TITO family and wires it to SGLang’s DeepSeek V4 encoder path instead of the regular HF/Jinja chat template path.

Note: A temporary compatibility guard is included in miles/utils/dumper_utils.py because the current SGLang V4 support branch is not upstreamed and its dumper API is too old. We would remove it after #1045 get merged.

Test Plan

1. DeepSeek V4 TITO tokenizer verifier

Verifies that the registered deepseekv4 TITO tokenizer can incrementally merge appended tool messages and decode back to the same text as a full prompt render.

python scripts/tools/verify_chat_template.py \
  --model deepseek-ai/DeepSeek-V4-Flash \
  --tito-model deepseekv4 \
  --tito-allowed-append-roles tool \
  --thinking both

2. Fast tokenizer regression suite

Verifies the shared chat-template / TITO tokenizer utilities, including DeepSeek V4 prompt-id alignment against SGLang’s DSv4 encoder path.

pytest tests/fast/utils/chat_template_utils

3. DeepSeek V4 session-server e2e

Runs the real Miles session-server TITO path with SGLang inference. This verifies that Miles-owned input_ids, DeepSeek V4 prompt encoding, tool-call parsing, rollback, and accumulated token mismatch checks work together.

python scripts/tools/verify_session_tito_tokenizer.py \
  --hf-checkpoint deepseek-ai/DeepSeek-V4-Flash \
  --tito-model deepseekv4 \
  --tito-allowed-append-roles tool \
  --reasoning-parser deepseek-v4 \
  --tool-call-parser deepseekv4 \
  --tp-size 4 \
  --num-gpus 4

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for DeepSeek V4 within the TITO tokenization framework and shifts the responsibility for prompt tokenization from SGLang to the Miles session server. Key implementation details include the addition of a specialized DeepSeekV4TITOTokenizer, custom rendering logic for DSv4 in apply_chat_template, and middleware updates to handle input_ids injection. The changes also include compatibility workarounds for SGLang v4 and significant enhancements to the chat template verification tools. Feedback highlights an incomplete suffix removal logic for DSv4 and the need for more robust tool formatting in fallback scenarios.

Comment thread miles/utils/chat_template_utils/template.py
Comment thread miles/utils/chat_template_utils/template.py Outdated
Comment thread miles/utils/chat_template_utils/template.py Outdated
@zyzshishui zyzshishui force-pushed the v4-tito branch 2 times, most recently from 17f10a8 to 6fc716c Compare May 4, 2026 07:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant