feat(tj): update unizero ppo implementation by tAnGjIa520 · Pull Request #473 · opendilab/LightZero

tAnGjIa520 · 2026-02-13T07:36:49Z

Main Changes

Replace custom GAE implementation with ding library's gae function
Fix data duplication issue when concatenating Segment data
Fix length inconsistency between advantage_segment and reward_segment
Optimize Buffer sample logic to prevent out-of-range errors
Add online training config and implementation plan documentation

主要改动

使用 ding 库的 gae 函数替代自定义 GAE 实现
修复 Segment 数据拼接时的重复问题
修复 advantage_segment 和 reward_segment 长度不一致问题
优化 Buffer sample 逻辑，避免超出范围
新增在线训练配置和实现方案文档

…environment

…files and update configurations

- Replace manual GAE computation with ding.rl_utils.gae_data and gae - Keep original implementation as _batch_compute_gae_for_pool_bak for backup - Add test script to verify GAE computation correctness - Fix lunarlander_env.py to handle both int and numpy array actions - Add lunarlander_disc_unizero_ppo_config.py for PPO training

Co-authored-by: Cursor <cursoragent@cursor.com>

xiongjyu and others added 11 commits August 27, 2025 13:11

adaptively set the config of batchsize and accumulation_steps

825225e

Merge branch 'opendilab:main' into main

1bce74a

Unizero changes MCTs to PPO for strategy optimization in the Jericho …

fd81ca3

…environment

your commit message

20b909e

Update README.md

8ac34c0

Rename project from LightZero1 to LightZero

944fc6b

Add test file

1fa0f0f

Update UniZero PPO implementation: merge PPO functionality into main …

2500766

…files and update configurations

Update: PPO loss computation improvements and code cleanup

226fb46

feat(tj): update unizero ppo implementation

c10fa1a

Co-authored-by: Cursor <cursoragent@cursor.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tj): update unizero ppo implementation#473

feat(tj): update unizero ppo implementation#473
tAnGjIa520 wants to merge 11 commits intoopendilab:mainfrom
tAnGjIa520:unizero-ppo-updates

tAnGjIa520 commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tAnGjIa520 commented Feb 13, 2026

Main Changes

主要改动

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants