Skip to content

feature(xjy): Refine PriorZero Implementation#441

Open
xiongjyu wants to merge 68 commits intoopendilab:dev-multitask-balance-clean-rftfrom
xiongjyu:dev-multitask-balance-clean-rft
Open

feature(xjy): Refine PriorZero Implementation#441
xiongjyu wants to merge 68 commits intoopendilab:dev-multitask-balance-clean-rftfrom
xiongjyu:dev-multitask-balance-clean-rft

Conversation

@xiongjyu
Copy link
Collaborator

@xiongjyu xiongjyu commented Nov 20, 2025

这个 PR 主要完善了 PriorZero的实现与开发流程,修复了若干影响训练正确性和稳定性的关键问题,并对训练逻辑、损失计算、数据采集进行了系统性的增强。

本 PR 已完成的工作
• 修复了 PriorZero 训练流程中的多个关键 bug,包括 game segment 构建、loss 计算、log-prob 对齐以及 action 处理中的错误。
• 完善了 REINFORCE / RFT 风格的策略优化实现,在 buffer 中正确存储并使用 old_logprob,保证策略更新的正确性。
• 补充并规范了训练过程中的统计指标,包括 KL divergence、policy entropy 等,用于更好地监控训练状态。
• 优化了 Collector 与 Replay Buffer 的数据流转逻辑,提升数据一致性与采样稳定性,减少隐式错误。
• 引入并验证了单卡场景下的 vLLM 权重同步机制。
• 多 GPU / 多节点场景下的 vLLM 权重同步与稳定性验证

@xiongjyu xiongjyu deleted the branch opendilab:dev-multitask-balance-clean-rft November 24, 2025 14:28
@xiongjyu xiongjyu closed this Nov 24, 2025
@xiongjyu xiongjyu deleted the dev-multitask-balance-clean-rft branch November 24, 2025 14:28
@xiongjyu xiongjyu reopened this Nov 24, 2025
@puyuan1996 puyuan1996 added the research Research work in progress label Nov 28, 2025
for i in range(num_engines):
bundle_indices = None
if tensor_parallel_size > 1:
bundle_indices = get_bundle_indices(shared_pg, i, tensor_parallel_size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是参考的ray官方改进吗

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个vllm_engine基本和openrlhf这部分是一样的;不过目前只使用一个vllm,并且tensor_parallel_size =1;因为显存够

…ple for world-model training; train LLM only on latest trajectories
@xiongjyu xiongjyu force-pushed the dev-multitask-balance-clean-rft branch from 51292a4 to f88989b Compare January 14, 2026 13:36
@puyuan1996 puyuan1996 changed the base branch from dev-multitask-balance-clean-rft to main January 15, 2026 06:56
@puyuan1996 puyuan1996 changed the base branch from main to dev-multitask-balance-clean-rft January 15, 2026 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

research Research work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants