Skip to content

fix incorrect sft loss mask for qwen3 thinking series models.#330

Merged
zhuzilin merged 3 commits intoTHUDM:mainfrom
luppx:fix-qwen3-thinking-sft-loss-mask
Sep 18, 2025
Merged

fix incorrect sft loss mask for qwen3 thinking series models.#330
zhuzilin merged 3 commits intoTHUDM:mainfrom
luppx:fix-qwen3-thinking-sft-loss-mask

Conversation

@luppx
Copy link
Contributor

@luppx luppx commented Sep 12, 2025

Descirption

We found that when performing SFT training with the Qwen3 Thinking series models (e.g., Qwen3-4B-Thinking-2507), the computation of the SFT loss mask produces incorrect token IDs after applying the chat template. Specifically, the reasoning_content field is ignored, and only the content field is preserved.

The related code is in slime.utils.mask_utils.MultiTurnLossMaskGenerator.gen_multi_turn_loss_mask_qwen:

def gen_multi_turn_loss_mask_qwen(self, messages: List[Dict]) -> Tuple[List[int], List[int]]:
        all_loss_masks = []
        all_token_ids = []

        for i, message in enumerate(messages):
            message_ids = self.tokenizer.apply_chat_template([message], tokenize=True)
...

Example case

[
    {"role": "system", "content": "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>\n{\"type\": \"function\", \"function\": {\"name\": \"python\", \"description\": \"Executes Python code in a stateless sandbox. The code must be a complete script with all necessary imports, and results should be explicitly output using print().\", \"parameters\": {\"type\": \"object\", \"properties\": {\"code\": {\"type\": \"string\", \"description\": \"The Python script to execute. It must include all required imports and use print() to display any results.\"}}, \"required\": [\"code\"]}}}\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call>"},
    {"role": "user", "content": "Calculate the square root of 16"},
    {"role": "assistant", "reasoning_content": "Call tool.", "tool_calls": [{"type": "function", "function": {"name": "python", "arguments": {"code": "import math\nprint(math.sqrt(16))"}}}]},
    {"role": "tool", "content": "4.0"},
    {"role": "assistant", "reasoning_content": "Tool responses 4.0, thus the answer is 4.0.", "content": "The answer is 4.0."}
]

When processing the message:

{"role": "assistant", "reasoning_content": "Call tool.", "tool_calls": [{"type": "function", "function": {"name": "python", "arguments": {"code": "import math\nprint(math.sqrt(16))"}}}]}

the result of self.tokenizer.apply_chat_template([message], tokenize=False) is "<|im_start|>assistant\n<tool_call>\n{"name": "python", "arguments": {"code": "import math\nprint(math.sqrt(16))"}}\n</tool_call><|im_end|>\n", which omits the reasoning_content "Call tool.", leading to incorrect message_ids.

Similarly, when processing:

{"role": "assistant", "reasoning_content": "Tool responses 4.0, thus the answer is 4.0.", "content": "The answer is 4.0."}

the result is "<|im_start|>assistant\nThe answer is 4.0.<|im_end|>\n", which also ignores the reasoning_content "Tool responses 4.0, thus the answer is 4.0.".

Root Cause

This issue is caused by the Qwen3 Thinking models’ chat template. The template only includes the <think> tag and appends reasoning_content when the message index > last user query message index.

However, in the current implementation, each message is rendered individually:

message_ids = self.tokenizer.apply_chat_template([message], tokenize=True)

This means the condition message index > last user query index is never satisfied, so the reasoning_content is always ignored.

The affected chat template for Qwen3 Thinking series models is as follows:

...
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
    {%- set index = (messages|length - 1) - loop.index0 %}
    {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
        {%- set ns.multi_step_tool = false %}
        {%- set ns.last_query_index = index %}
    {%- endif %}
{%- endfor %}
...

@zhuzilin
Copy link
Contributor

Thank you so much for this! Is it possible to merge the Qwen3MultiTurnLossMaskGenerator into MultiTurnLossMaskGenerator? We can create a folder for registering mask generator if that is necessary.

@luppx
Copy link
Contributor Author

luppx commented Sep 15, 2025

Thank you so much for this! Is it possible to merge the Qwen3MultiTurnLossMaskGenerator into MultiTurnLossMaskGenerator? We can create a folder for registering mask generator if that is necessary.

Hi, I have merged the Qwen3MultiTurnLossMaskGenerator into MultiTurnLossMaskGenerator. Could you please review it?

Besides, I have one more question regarding MultiTurnLossMaskGenerator.
The return types of gen_multi_turn_loss_mask_qwen and gen_multi_turn_loss_mask_distill_qwen are Tuple[List[int], List[int]], whereas the return type of get_loss_mask, which calls these functions, is List[int].
Is this intentional, or could it be a mistake?

@zhuzilin zhuzilin merged commit db0dcf5 into THUDM:main Sep 18, 2025
3 of 4 checks passed
@zhuzilin
Copy link
Contributor

whereas the return type of get_loss_mask, which calls these functions, is List[int].
Is this intentional, or could it be a mistake?

oh... this is a mistake...

llltttwww pushed a commit to llltttwww/slime that referenced this pull request Nov 30, 2025
)

* fix incorrect sft loss mask for qwen3 thinking series models.

* Merge Qwen3MultiTurnLossMaskGenerator into MultiTurnLossMaskGenerator
Yangruipis pushed a commit to rednote-ai/slime that referenced this pull request Feb 28, 2026
)

* fix incorrect sft loss mask for qwen3 thinking series models.

* Merge Qwen3MultiTurnLossMaskGenerator into MultiTurnLossMaskGenerator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants