fix incorrect sft loss mask for qwen3 thinking series models.#330
fix incorrect sft loss mask for qwen3 thinking series models.#330zhuzilin merged 3 commits intoTHUDM:mainfrom
Conversation
|
Thank you so much for this! Is it possible to merge the |
Hi, I have merged the Besides, I have one more question regarding |
oh... this is a mistake... |
Descirption
We found that when performing SFT training with the Qwen3 Thinking series models (e.g., Qwen3-4B-Thinking-2507), the computation of the SFT loss mask produces incorrect token IDs after applying the chat template. Specifically, the
reasoning_contentfield is ignored, and only thecontentfield is preserved.The related code is in
slime.utils.mask_utils.MultiTurnLossMaskGenerator.gen_multi_turn_loss_mask_qwen:Example case
[ {"role": "system", "content": "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>\n{\"type\": \"function\", \"function\": {\"name\": \"python\", \"description\": \"Executes Python code in a stateless sandbox. The code must be a complete script with all necessary imports, and results should be explicitly output using print().\", \"parameters\": {\"type\": \"object\", \"properties\": {\"code\": {\"type\": \"string\", \"description\": \"The Python script to execute. It must include all required imports and use print() to display any results.\"}}, \"required\": [\"code\"]}}}\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call>"}, {"role": "user", "content": "Calculate the square root of 16"}, {"role": "assistant", "reasoning_content": "Call tool.", "tool_calls": [{"type": "function", "function": {"name": "python", "arguments": {"code": "import math\nprint(math.sqrt(16))"}}}]}, {"role": "tool", "content": "4.0"}, {"role": "assistant", "reasoning_content": "Tool responses 4.0, thus the answer is 4.0.", "content": "The answer is 4.0."} ]When processing the message:
{"role": "assistant", "reasoning_content": "Call tool.", "tool_calls": [{"type": "function", "function": {"name": "python", "arguments": {"code": "import math\nprint(math.sqrt(16))"}}}]}the result of
self.tokenizer.apply_chat_template([message], tokenize=False)is "<|im_start|>assistant\n<tool_call>\n{"name": "python", "arguments": {"code": "import math\nprint(math.sqrt(16))"}}\n</tool_call><|im_end|>\n", which omits thereasoning_content"Call tool.", leading to incorrectmessage_ids.Similarly, when processing:
{"role": "assistant", "reasoning_content": "Tool responses 4.0, thus the answer is 4.0.", "content": "The answer is 4.0."}the result is "<|im_start|>assistant\nThe answer is 4.0.<|im_end|>\n", which also ignores the
reasoning_content"Tool responses 4.0, thus the answer is 4.0.".Root Cause
This issue is caused by the Qwen3 Thinking models’ chat template. The template only includes the
<think>tag and appendsreasoning_contentwhen the message index > last user query message index.However, in the current implementation, each message is rendered individually:
This means the condition message index > last user query index is never satisfied, so the
reasoning_contentis always ignored.The affected chat template for Qwen3 Thinking series models is as follows: