Skip to content

[RL][Cherry-Pick] Fix the out-of-bounds issue caused by int32 in the R3 kernel#7496

Open
gongshaotian wants to merge 6 commits intoPaddlePaddle:release/2.6from
gongshaotian:r3_fix_int32_bug_2.6
Open

[RL][Cherry-Pick] Fix the out-of-bounds issue caused by int32 in the R3 kernel#7496
gongshaotian wants to merge 6 commits intoPaddlePaddle:release/2.6from
gongshaotian:r3_fix_int32_bug_2.6

Conversation

@gongshaotian
Copy link
Copy Markdown
Collaborator

Motivation

  1. Fix the out-of-bounds issue caused by int32 in the Triton kernel
  2. cherry pick PR:

Modifications

Routing Replay Triton Kernel

Usage or Command

Pass

Accuracy Tests

Pass

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 20, 2026

Thanks for your contribution!

@gongshaotian gongshaotian changed the title R3 fix int32 bug 2.6 [RL][Cherry-Pick] Fix the out-of-bounds issue caused by int32 in the R3 kernel Apr 20, 2026
PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-20 14:25 CST

📋 Review 摘要

PR 概述:修复 R3 Routing Replay Triton Kernel 中 int32 导致的越界问题,并新增 suspend 机制、fused/non-fused 路径的 clear 逻辑修正等改进。
变更范围model_executor/layers/moe/config.pyenvs.pyworker/gpu_model_runner.py
影响面 TagOP RL

问题

级别 文件 概述
🟡 建议 routing_indices_cache.py:164 日志中 "Congfig" 拼写错误
🟡 建议 routing_indices_cache.py:130 移除了 seq_lens_decoder 的 shape 断言,降低了 Python 层防御
🟡 建议 routing_indices_cache.py:192 环境变量读取方式与 envs.py 注册不一致
❓ 疑问 routing_indices_cache.py:870 delete_prefix_batchdelete_batch 语义变更,请确认

总体评价

Triton kernel 的 int32→int64 修复和边界检查增强是本 PR 的核心价值,修复方式正确。fused/non-fused 路径的 clear 逻辑拆分也合理。有几处小问题建议修正。

self.num_moe_layers = fd_config.model_config.num_hidden_layers - fd_config.model_config.moe_layer_start_index
self.only_last_turn = fd_config.routing_replay_config.only_last_turn
self.use_fused_put = fd_config.routing_replay_config.use_fused_put
logger.info(f"[R3] Rollout Routing Replay Congfig: {fd_config.routing_replay_config}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 日志中 "Congfig" 拼写错误,应为 "Config"。

Suggested change
logger.info(f"[R3] Rollout Routing Replay Congfig: {fd_config.routing_replay_config}")
logger.info(f"[R3] Rollout Routing Replay Config: {fd_config.routing_replay_config}")

assert (
topk_ids.shape[1] == routing_replay_table.shape[3]
), f"({topk_ids.shape[1]}, {routing_replay_table.shape[3]})"
assert batch_id_per_token.shape[0] == token_num, f"({batch_id_per_token.shape[0]}, {token_num})"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 原有的 assert seq_lens_decoder.shape[0] == max_num_seqs 断言被移除了。虽然 kernel 内部已新增 batch_mask 做越界防护,但 Python 层的断言能在 kernel 启动前更早地捕获 tensor shape 不匹配问题,并给出更清晰的错误信息。建议保留此断言:

assert seq_lens_decoder.shape[0] >= max_num_seqs, f"({seq_lens_decoder.shape[0]}, {max_num_seqs})"

def update_suspend_routing_replay(self):
"""Allow RL to use R3 in different training rounds"""
# TODO(gongshaotian): Delete this func
suspend_routing_replay = os.environ.get("FD_SUSPEND_ROUTING_REPLAY", "0")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 FD_SUSPEND_ROUTING_REPLAY 已在 envs.py 中注册,但这里通过 os.environ.get 直接读取。建议统一使用 envs 模块的方式读取,保持项目内环境变量访问模式的一致性:

from fastdeploy import envs
self.suspend_routing_replay = envs.FD_SUSPEND_ROUTING_REPLAY

如果此处需要在运行时动态感知环境变量变更(不走缓存),可以忽略此建议,但建议在注释中说明原因。

async def clear_prefix_batch(self, routing_prefix_key: str):
time_before_clear = time.perf_counter()
result = await self.p2p_client.delete_prefix_batch([routing_prefix_key])
result = await self.p2p_client.delete_batch([routing_prefix_key])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 此处将 delete_prefix_batch 改为了 delete_batch,语义从「按前缀批量删除」变为「按精确 key 删除」。

结合 submit_clear_prefix_batch_task 中 non-fused 路径现在传入 layer_idx 构建精确 key(如 {rollout_id}_{layer_idx}),逐层删除可以工作。

但方法名 clear_prefix_batch 和参数名 routing_prefix_key 仍暗示前缀语义,与实际的精确删除行为不一致,容易造成后续维护者误解。建议同步更新方法名和参数名,例如改为 clear_batch / routing_key

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants