[GRPO] Fix re-tokenization bug in tool-calling loop by concatenating token IDs by qgallouedec · Pull Request #5242 · huggingface/trl

qgallouedec · 2026-03-07T05:40:15Z

Context

Part of the series to fix the re-tokenization bug in GRPO multi-turn tool calling (see #5224).

closes #5224
closes #5144

When the model generates a completion in a tool-calling loop, the decoded text is re-tokenized via apply_chat_template, which can produce different token IDs due to BPE merge ambiguities. To fix this, we need a token-in / token-out pipeline: tokenize once, then pass raw token IDs through every subsequent generation call — never decoding and re-tokenizing.

This is the final PR in the series. It eliminates the re-tokenization in the tool-calling loop — the actual source of the bug.

Changes

New _get_tool_suffix_ids(tool_messages) method: Tokenizes only the tool result portion by diffing a minimal dummy conversation (2 messages vs 3 messages). This avoids re-tokenizing the full conversation history.
_tool_call_loop: Instead of re-tokenizing prompt + completion + tool_results via apply_chat_template, builds the token sequence by concatenation: prompt_ids + completion_ids + tool_suffix_ids. The original prompt and completion token IDs are preserved exactly as they were — only the new tool result tokens are freshly tokenized.
Removed the prefix-preserving sanity check (no longer needed since the prefix is preserved by construction).
Removed the _tokenize_prompts call in the tool loop.

The bug and the fix

Previously, after a tool call:

The completion was decoded to text and appended as an assistant message
The full prompt + assistant + tool_results was re-tokenized via apply_chat_template
Due to BPE merge ambiguity, step 2 could produce different token IDs for the completion part

Now:

The original prompt_ids and completion_ids are kept as-is (never decoded and re-tokenized)
Only the tool result suffix is tokenized, using a minimal dummy conversation to extract just the template formatting
The full prompt is built by concatenation: prompt_ids + completion_ids + suffix_ids

Backward compatibility

No user-facing API changes. _get_tool_suffix_ids and _tool_call_loop are internal methods.

Note

Medium Risk
Medium risk because it changes GRPO multi-turn tool-calling generation by altering how prompts are rebuilt and tokenized, which can affect sequence lengths/masks and downstream training behavior. RLOO changes are mechanical return-value cleanups but share the same generation pathways.

Overview
Fixes a GRPO multi-turn tool-calling bug by removing decode/re-tokenize steps inside the tool loop and instead preserving original prompt_ids/completion_ids while appending newly tokenized tool-result suffix IDs.

Adds _get_tool_suffix_ids() and rewires _tool_call_loop to build prompt+completion+tool inputs via token-ID concatenation (including image/multimodal subsetting), and updates generation helpers in both GRPOTrainer and RLOOTrainer to stop returning unused prompt_ids from _generate_single_turn/vLLM and simplify prompt-length handling.

^{Written by Cursor Bugbot for commit 5147625. This will update automatically on new commits. Configure here.}

…dling

…-token

… for None values

…-token

…ration

…_turn

… left-padding for per-token fields

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3375aeac6c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

trl/trainer/grpo_trainer.py

… and multimodal_fields parameters

trl/trainer/grpo_trainer.py

Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>

…enerate

…loop

…r and RLOOTrainer

…loop

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

trl/trainer/rloo_trainer.py

qgallouedec · 2026-03-13T23:57:30Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 367a79ebc6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

trl/trainer/grpo_trainer.py

qgallouedec and others added 27 commits March 5, 2026 19:10

support prompts or token IDs in VLLMClient and update API request han…

f10285e

…dling

test

7d2bb67

consistency

3b356ac

fix

82c4508

another fix

3ea2fcf

fix docstring

445f4ba

Add support for multi-modal inputs in VLLMClient and vllm_serve

8c6c88d

Merge branch 'main' into vllm-accept-token-ids

f617b2d

Merge branch 'main' into vllm-accept-token-ids

eaffd67

Move rollout_func from _generate_single_turn to _generate`

f3f6a5d

fix style

d417543

support multi-image

4b927d6

style

029fc1f

Merge branch 'vllm-accept-token-ids' into vllm-support-image-with-raw…

20b4039

…-token

Merge branch 'vllm-support-image-with-raw-token' into move-rollout-func

b8e3912

Fix handling of images in OnlineDPOTrainer to ensure proper structure…

07181cb

… for None values

Merge branch 'main' into vllm-accept-token-ids

6ff1e56

Merge branch 'vllm-accept-token-ids' into vllm-support-image-with-raw…

9f340e4

…-token

Merge branch 'vllm-support-image-with-raw-token' into move-rollout-func

d138be7

Move tokenization before vLLM generation call

09128d6

Fix deadlock issue by ensuring images are always gathered in VLLMGene…

7fd1711

…ration

Unify tokenization across all generation backends in _generate_single…

3ab04b0

…_turn

Extract tokenization out of _generate_single_turn into _tokenize_prompts

5d6d067

Enhance multimodal input handling in GRPO and RLOO trainers by adding…

b4d2c34

… left-padding for per-token fields

style

4922362

Merge branch 'unify-tokenization-generate' into extract-tokenize-prompts

37c48b3

Fix re-tokenization bug in tool-calling loop by concatenating token IDs

3375aea

chatgpt-codex-connector bot reviewed Mar 7, 2026

View reviewed changes

trl/trainer/grpo_trainer.py Outdated Show resolved Hide resolved

Enhance _tool_call_loop to support multimodal inputs by adding images…

638f88a

… and multimodal_fields parameters

cursor bot reviewed Mar 7, 2026

View reviewed changes

trl/trainer/grpo_trainer.py Outdated Show resolved Hide resolved

qgallouedec and others added 21 commits March 10, 2026 12:23

Update trl/trainer/rloo_trainer.py

ca2cae3

Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>

Merge branch 'main' into vllm-generate-with-token-ids

fee553d

Merge branch 'vllm-generate-with-token-ids' into unify-tokenization-g…

90df2de

…enerate

Merge branch 'unify-tokenization-generate' into extract-tokenize-prompts

f36c0ea

Merge branch 'extract-tokenize-prompts' into fix-retokenization-tool-…

8678382

…loop

fix

fdaa90a

style

6f10cd2

Merge branch 'unify-tokenization-generate' into extract-tokenize-prompts

533c337

Merge branch 'extract-tokenize-prompts' into fix-retokenization-tool-…

50418e0

…loop

Merge branch 'main' into unify-tokenization-generate

7e7e3b3

Merge branch 'unify-tokenization-generate' into extract-tokenize-prompts

31d8a0c

Merge branch 'extract-tokenize-prompts' into fix-retokenization-tool-…

e88987f

…loop

Merge branch 'main' into extract-tokenize-prompts

8b4f6af

Merge branch 'extract-tokenize-prompts' into fix-retokenization-tool-…

a704d89

…loop

Merge branch 'main' into extract-tokenize-prompts

81cf273

Remove dead code: eliminate prompt tokenization logic from GRPOTraine…

918686b

…r and RLOOTrainer

remove unused extra_fields from _generate_single_turn return value

9b8de83

style

6c8f55c

Merge branch 'extract-tokenize-prompts' into fix-retokenization-tool-…

130d974

…loop

properly merge upstream

8b27397

fix

6c9db28

Base automatically changed from extract-tokenize-prompts to main March 10, 2026 21:35

Merge branch 'main' into fix-retokenization-tool-loop

441725b

cursor bot reviewed Mar 13, 2026

View reviewed changes

trl/trainer/rloo_trainer.py Outdated Show resolved Hide resolved

align with main

367a79e

chatgpt-codex-connector bot reviewed Mar 14, 2026

View reviewed changes

trl/trainer/grpo_trainer.py Show resolved Hide resolved

trl/trainer/grpo_trainer.py Outdated Show resolved Hide resolved

qgallouedec and others added 2 commits March 14, 2026 00:21

fix

f3f0f8d

Merge branch 'main' into fix-retokenization-tool-loop

5147625

qgallouedec mentioned this pull request Mar 14, 2026

Fix GRPO tool mask alignment after tool-call retokenization #5145

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GRPO] Fix re-tokenization bug in tool-calling loop by concatenating token IDs#5242

[GRPO] Fix re-tokenization bug in tool-calling loop by concatenating token IDs#5242
qgallouedec wants to merge 100 commits intomainfrom
fix-retokenization-tool-loop

qgallouedec commented Mar 7, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

qgallouedec commented Mar 13, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

qgallouedec commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Changes

The bug and the fix

Backward compatibility

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

qgallouedec commented Mar 13, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qgallouedec commented Mar 7, 2026 •

edited

Loading