[RL][Cherry-Pick] Fix the out-of-bounds issue caused by int32 in the R3 kernel by gongshaotian · Pull Request #7496 · PaddlePaddle/FastDeploy

gongshaotian · 2026-04-20T02:57:08Z

Motivation

Fix the out-of-bounds issue caused by int32 in the Triton kernel
cherry pick PR：

Modifications

Routing Replay Triton Kernel

Usage or Command

Pass

Accuracy Tests

Pass

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

…dle#6604) * Optimizate delete batch and fused put * refine code * refine code * refine code * Support suspend r3

paddle-bot · 2026-04-20T02:57:18Z

Thanks for your contribution!

PaddlePaddle-bot

🤖 AI Code Review | 2026-04-20 14:25 CST

📋 Review 摘要

PR 概述：修复 R3 Routing Replay Triton Kernel 中 int32 导致的越界问题，并新增 suspend 机制、fused/non-fused 路径的 clear 逻辑修正等改进。
变更范围：model_executor/layers/moe/、config.py、envs.py、worker/gpu_model_runner.py
影响面 Tag：OP RL

问题

级别	文件	概述
🟡 建议	`routing_indices_cache.py:164`	日志中 "Congfig" 拼写错误
🟡 建议	`routing_indices_cache.py:130`	移除了 `seq_lens_decoder` 的 shape 断言，降低了 Python 层防御
🟡 建议	`routing_indices_cache.py:192`	环境变量读取方式与 `envs.py` 注册不一致
❓ 疑问	`routing_indices_cache.py:870`	`delete_prefix_batch` → `delete_batch` 语义变更，请确认

总体评价

Triton kernel 的 int32→int64 修复和边界检查增强是本 PR 的核心价值，修复方式正确。fused/non-fused 路径的 clear 逻辑拆分也合理。有几处小问题建议修正。

PaddlePaddle-bot · 2026-04-20T06:25:09Z

        self.num_moe_layers = fd_config.model_config.num_hidden_layers - fd_config.model_config.moe_layer_start_index
        self.only_last_turn = fd_config.routing_replay_config.only_last_turn
        self.use_fused_put = fd_config.routing_replay_config.use_fused_put
+        logger.info(f"[R3] Rollout Routing Replay Congfig: {fd_config.routing_replay_config}")


🟡 建议 日志中 "Congfig" 拼写错误，应为 "Config"。

Suggested change

logger.info(f"[R3] Rollout Routing Replay Congfig: {fd_config.routing_replay_config}")

logger.info(f"[R3] Rollout Routing Replay Config: {fd_config.routing_replay_config}")

PaddlePaddle-bot · 2026-04-20T06:25:10Z

+    assert (
+        topk_ids.shape[1] == routing_replay_table.shape[3]
+    ), f"({topk_ids.shape[1]}, {routing_replay_table.shape[3]})"
+    assert batch_id_per_token.shape[0] == token_num, f"({batch_id_per_token.shape[0]}, {token_num})"


🟡 建议 原有的 assert seq_lens_decoder.shape[0] == max_num_seqs 断言被移除了。虽然 kernel 内部已新增 batch_mask 做越界防护，但 Python 层的断言能在 kernel 启动前更早地捕获 tensor shape 不匹配问题，并给出更清晰的错误信息。建议保留此断言：

assert seq_lens_decoder.shape[0] >= max_num_seqs, f"({seq_lens_decoder.shape[0]}, {max_num_seqs})"

PaddlePaddle-bot · 2026-04-20T06:25:10Z

+    def update_suspend_routing_replay(self):
+        """Allow RL to use R3 in different training rounds"""
+        # TODO(gongshaotian): Delete this func
+        suspend_routing_replay = os.environ.get("FD_SUSPEND_ROUTING_REPLAY", "0")


🟡 建议 FD_SUSPEND_ROUTING_REPLAY 已在 envs.py 中注册，但这里通过 os.environ.get 直接读取。建议统一使用 envs 模块的方式读取，保持项目内环境变量访问模式的一致性：

from fastdeploy import envs self.suspend_routing_replay = envs.FD_SUSPEND_ROUTING_REPLAY

如果此处需要在运行时动态感知环境变量变更（不走缓存），可以忽略此建议，但建议在注释中说明原因。

PaddlePaddle-bot · 2026-04-20T06:25:10Z

    async def clear_prefix_batch(self, routing_prefix_key: str):
        time_before_clear = time.perf_counter()
-        result = await self.p2p_client.delete_prefix_batch([routing_prefix_key])
+        result = await self.p2p_client.delete_batch([routing_prefix_key])


❓ 疑问 此处将 delete_prefix_batch 改为了 delete_batch，语义从「按前缀批量删除」变为「按精确 key 删除」。

结合 submit_clear_prefix_batch_task 中 non-fused 路径现在传入 layer_idx 构建精确 key（如 {rollout_id}_{layer_idx}），逐层删除可以工作。

但方法名 clear_prefix_batch 和参数名 routing_prefix_key 仍暗示前缀语义，与实际的精确删除行为不一致，容易造成后续维护者误解。建议同步更新方法名和参数名，例如改为 clear_batch / routing_key。

gongshaotian added 4 commits April 20, 2026 10:54

[RL]Perf: Optimize batch delete prefix and fused put in R3 (PaddlePad…

f52bd00

…dle#6604) * Optimizate delete batch and fused put * refine code * refine code * refine code * Support suspend r3

[RL] Fix R3 Empty bug with TP=1 (PaddlePaddle#6777)

140a7fa

Fix int32 overflow

77ee863

refine code

c3a74cc

gongshaotian had a problem deploying to Metax_ci April 20, 2026 02:57 — with GitHub Actions Failure

gongshaotian changed the title ~~R3 fix int32 bug 2.6~~ [RL][Cherry-Pick] Fix the out-of-bounds issue caused by int32 in the R3 kernel Apr 20, 2026

merge release 2.6

ef107f9

This comment was marked as outdated.

Sign in to view

gongshaotian had a problem deploying to Metax_ci April 20, 2026 03:22 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

fix seq_lens_decoder bug

9d351c4

gongshaotian had a problem deploying to Metax_ci April 20, 2026 06:14 — with GitHub Actions Failure

PaddlePaddle-bot reviewed Apr 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RL][Cherry-Pick] Fix the out-of-bounds issue caused by int32 in the R3 kernel#7496

[RL][Cherry-Pick] Fix the out-of-bounds issue caused by int32 in the R3 kernel#7496
gongshaotian wants to merge 6 commits intoPaddlePaddle:release/2.6from
gongshaotian:r3_fix_int32_bug_2.6

gongshaotian commented Apr 20, 2026

Uh oh!

paddle-bot bot commented Apr 20, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot Apr 20, 2026

Uh oh!

PaddlePaddle-bot Apr 20, 2026

Uh oh!

PaddlePaddle-bot Apr 20, 2026

Uh oh!

PaddlePaddle-bot Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	logger.info(f"[R3] Rollout Routing Replay Congfig: {fd_config.routing_replay_config}")
	logger.info(f"[R3] Rollout Routing Replay Config: {fd_config.routing_replay_config}")

Conversation

gongshaotian commented Apr 20, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Apr 20, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

总体评价

Uh oh!

PaddlePaddle-bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants