fix incorrect sft loss mask for qwen3 thinking series models. by luppx · Pull Request #330 · THUDM/slime

luppx · 2025-09-12T07:23:31Z

Descirption

We found that when performing SFT training with the Qwen3 Thinking series models (e.g., Qwen3-4B-Thinking-2507), the computation of the SFT loss mask produces incorrect token IDs after applying the chat template. Specifically, the reasoning_content field is ignored, and only the content field is preserved.

The related code is in slime.utils.mask_utils.MultiTurnLossMaskGenerator.gen_multi_turn_loss_mask_qwen:

def gen_multi_turn_loss_mask_qwen(self, messages: List[Dict]) -> Tuple[List[int], List[int]]:
        all_loss_masks = []
        all_token_ids = []

        for i, message in enumerate(messages):
            message_ids = self.tokenizer.apply_chat_template([message], tokenize=True)
...

Example case

[
    {"role": "system", "content": "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>\n{\"type\": \"function\", \"function\": {\"name\": \"python\", \"description\": \"Executes Python code in a stateless sandbox. The code must be a complete script with all necessary imports, and results should be explicitly output using print().\", \"parameters\": {\"type\": \"object\", \"properties\": {\"code\": {\"type\": \"string\", \"description\": \"The Python script to execute. It must include all required imports and use print() to display any results.\"}}, \"required\": [\"code\"]}}}\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call>"},
    {"role": "user", "content": "Calculate the square root of 16"},
    {"role": "assistant", "reasoning_content": "Call tool.", "tool_calls": [{"type": "function", "function": {"name": "python", "arguments": {"code": "import math\nprint(math.sqrt(16))"}}}]},
    {"role": "tool", "content": "4.0"},
    {"role": "assistant", "reasoning_content": "Tool responses 4.0, thus the answer is 4.0.", "content": "The answer is 4.0."}
]

When processing the message:

{"role": "assistant", "reasoning_content": "Call tool.", "tool_calls": [{"type": "function", "function": {"name": "python", "arguments": {"code": "import math\nprint(math.sqrt(16))"}}}]}

the result of self.tokenizer.apply_chat_template([message], tokenize=False) is "<|im_start|>assistant\n<tool_call>\n{"name": "python", "arguments": {"code": "import math\nprint(math.sqrt(16))"}}\n</tool_call><|im_end|>\n", which omits the reasoning_content "Call tool.", leading to incorrect message_ids.

Similarly, when processing:

{"role": "assistant", "reasoning_content": "Tool responses 4.0, thus the answer is 4.0.", "content": "The answer is 4.0."}

the result is "<|im_start|>assistant\nThe answer is 4.0.<|im_end|>\n", which also ignores the reasoning_content "Tool responses 4.0, thus the answer is 4.0.".

Root Cause

This issue is caused by the Qwen3 Thinking models’ chat template. The template only includes the <think> tag and appends reasoning_content when the message index > last user query message index.

However, in the current implementation, each message is rendered individually:

message_ids = self.tokenizer.apply_chat_template([message], tokenize=True)

This means the condition message index > last user query index is never satisfied, so the reasoning_content is always ignored.

The affected chat template for Qwen3 Thinking series models is as follows:

...
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
    {%- set index = (messages|length - 1) - loop.index0 %}
    {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
        {%- set ns.multi_step_tool = false %}
        {%- set ns.last_query_index = index %}
    {%- endif %}
{%- endfor %}
...

zhuzilin · 2025-09-15T10:54:25Z

Thank you so much for this! Is it possible to merge the Qwen3MultiTurnLossMaskGenerator into MultiTurnLossMaskGenerator? We can create a folder for registering mask generator if that is necessary.

…-sft-loss-mask

luppx · 2025-09-15T16:19:17Z

Thank you so much for this! Is it possible to merge the Qwen3MultiTurnLossMaskGenerator into MultiTurnLossMaskGenerator? We can create a folder for registering mask generator if that is necessary.

Hi, I have merged the Qwen3MultiTurnLossMaskGenerator into MultiTurnLossMaskGenerator. Could you please review it?

Besides, I have one more question regarding MultiTurnLossMaskGenerator.
The return types of gen_multi_turn_loss_mask_qwen and gen_multi_turn_loss_mask_distill_qwen are Tuple[List[int], List[int]], whereas the return type of get_loss_mask, which calls these functions, is List[int].
Is this intentional, or could it be a mistake?

zhuzilin · 2025-09-18T03:45:20Z

whereas the return type of get_loss_mask, which calls these functions, is List[int].
Is this intentional, or could it be a mistake?

oh... this is a mistake...

) * fix incorrect sft loss mask for qwen3 thinking series models. * Merge Qwen3MultiTurnLossMaskGenerator into MultiTurnLossMaskGenerator

fix incorrect sft loss mask for qwen3 thinking series models.

95bbc67

luppx added 2 commits September 16, 2025 00:03

Merge Qwen3MultiTurnLossMaskGenerator into MultiTurnLossMaskGenerator

4f77c11

Merge branch 'main' of github.com:luppx/slime into fix-qwen3-thinking…

1a84d28

…-sft-loss-mask

zhuzilin merged commit db0dcf5 into THUDM:main Sep 18, 2025
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix incorrect sft loss mask for qwen3 thinking series models.#330

fix incorrect sft loss mask for qwen3 thinking series models.#330
zhuzilin merged 3 commits intoTHUDM:mainfrom
luppx:fix-qwen3-thinking-sft-loss-mask

luppx commented Sep 12, 2025

Uh oh!

zhuzilin commented Sep 15, 2025

Uh oh!

luppx commented Sep 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

zhuzilin commented Sep 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

luppx commented Sep 12, 2025

Descirption

Example case

Root Cause

Uh oh!

zhuzilin commented Sep 15, 2025

Uh oh!

luppx commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

zhuzilin commented Sep 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

luppx commented Sep 15, 2025 •

edited

Loading