feature(pu): add atari/dmc multitask and balance pipeline in ScaleZero paper by puyuan1996 · Pull Request #451 · opendilab/LightZero

puyuan1996 · 2025-12-03T03:37:12Z

This pull request implements the core components of the ScaleZero paper by introducing a multi-task, balanced training pipeline for Atari and DeepMind Control (DMC) environments.

To enhance stability and performance in this new multi-task setting, several key improvements and bug fixes were made. We replaced BatchNorm with the more robust LayerNorm, corrected a critical bug that caused the kv_cache to be improperly overwritten, and fixed the state reset logic in _reset_eval() and _reset_collect() to ensure accurate evaluation.

Additionally, the PR introduces target-entropy control for better policy optimization, makes the number of MCTS simulations configurable for evaluation, and integrates relevant updates from the longrun PR #400 to maintain code consistency.

本次 PR 核心是实现了 ScaleZero 论文的关键部分，为 Atari 和 DeepMind Control (DMC) 环境引入了一套多任务（multi-task）且均衡（balanced）的训练流水线。

为确保在多任务场景下的稳定性和高性能，我们进行了一系列关键优化与修复：将不稳定的 BatchNorm 替换为更鲁棒的 LayerNorm；修复了导致状态错误的 kv_cache 重写 Bug；并修正了 _reset_eval() 和 _reset_collect() 中的状态重置逻辑，以保证评估的准确性。

此外，本次更新还引入了 target-entropy 控制机制以优化策略，并使评估阶段的 MCTS 模拟次数变为可配置项。同时，我们整合了 longrun PR #400 的相关变更，以保持代码库的统一和同步。

… related metrics

…used in rope)

…me option

…er and fix solved gpu batch-size bug

…dilab/LightZero into dev-multitask-balance-clean

…arnableScale in balance pipeline

…_curriculum_to_encoder option

…ro.py and unizero.py

lzero/entry/train_unizero_multitask_balance_segment_ddp.py

lzero/entry/train_muzero_multitask_segment_ddp.py

lzero/entry/__init__.py

lzero/entry/README_zh.md

zoo/dmc2gym/envs/dmc2gym_lightzero_env.py

lzero/model/unizero_world_models/tokenizer.py

…o lzero/entry/utils.py

lzero/mcts/buffer/game_segment.py

lzero/mcts/tree_search/mcts_ctree.py

PaParaZz1 · 2026-01-08T09:05:07Z

lzero/mcts/tree_search/mcts_ctree_sampled.py

                MCTS stage 3: Backup
                    At the end of the simulation, the statistics along the trajectory are updated.
                """
+                # search_depth is used for rope in UniZero


为啥 ctree_sampled 这边，没有根据用不用 rope(timestep) 划分分支

sampled还不支持rope，加到todo里了

PaParaZz1 · 2026-01-08T09:43:57Z

lzero/policy/unizero.py


        # Clear caches if the current steps are a multiple of the clear interval
-        if current_steps % clear_interval == 0:
+        if current_steps is not None and current_steps % clear_interval == 0:


这个间隔是怎么设置的呢

目前如果sample_type='transition'，是按照 game_segment_length 启发式设置的

lzero/policy/unizero.py

PaParaZz1 · 2026-01-08T09:51:32Z

lzero/policy/head_clip_manager.py

+
+        # Log mapping
+        self.logits_key_mapping = {
+            'policy': 'logits_policy',


感觉 clip 还是在 encoder 和 transformer backbone 弄吧，head 的可以去掉了

lzero/model/vit.py

…onfig bug

puyuan1996 and others added 30 commits April 25, 2025 11:26

feature(pu): add unizero/muzero multitask pipeline and net plasticity…

da4cdba

… related metrics

fix(pu): fix some adaptation bug

a6eed25

feature(pu): add unizero multitask balance pipeline for atari and dmc

67a0e9a

fix(pu): fix some adaptation bug

f083096

feature(pu): add vit encoder for unizero

37eb118

polish(pu): polish moe layer in transformer

f32d63e

feature(pu): add eval norm mean/medium for atari

c0aa747

fix(pu): fix atari norm mean/median, fix collect in balance pipeline

8b3cff6

polish(pu): polish config

f2c158b

fix(pu): fix dmc multitask to be compatiable with timestep (which is …

20b42f7

…used in rope)

polish(pu): polish config

39ee55e

fix(pu): fix task_id bug in balance pipeline, and polish benchmark_na…

e85c449

…me option

fix(pu): fix benchmark_name option

c16d564

polish(pu): fix norm score computation, adapt config to aliyun

474b81c

polish(pu): polish unizero_mt balance pipeline use CurriculumControll…

50e367e

…er and fix solved gpu batch-size bug

tmp

9171c3e

Merge branch 'dev-multitask-balance-clean' of https://github.com/open…

bc5003a

…dilab/LightZero into dev-multitask-balance-clean

tmp

158e4a0

tmp

d66b986

test(pu): add vit moe test

0d5ede0

polish(pu): add adapter_scales to tb

ca6ddb6

feature(pu): add atari uz balance config

7dd6c04

polish(pu): add stable_adaptor_scale

c8e7cb8

tmp

0313335

sync code

ef170fd

polish(pu): use freeze_non_lora_parameters in transformer, not use Le…

bbec353

…arnableScale in balance pipeline

feature(pu): add vit-encoder lora in balance pipeline

20648d5

polish(pu): fix reanalyze index bug, fix global_solved bug, add apply…

db6032a

…_curriculum_to_encoder option

polish(pu): add collect/eval_num_simulations option

f63b544

polish(pu): polish comments and style in entry of scalezero

bbbe505

puyuan1996 added 6 commits December 30, 2025 00:27

fix(pu): add policy_logits_clip_method option

7cf1e2d

fix(pu): add policy_logits_clip_method option

50db85f

polish(pu): polish comments, docstring, readme

fefd62b

polish(pu): polish atari unizero configs and default configs in unize…

377dc97

…ro.py and unizero.py

polish(pu): update to macos-15

0d6049a

fix(pu): fix gymnasium[atari] version

116d10a

PaParaZz1 reviewed Jan 6, 2026

View reviewed changes

lzero/entry/train_unizero_multitask_balance_segment_ddp.py Outdated Show resolved Hide resolved

PaParaZz1 requested changes Jan 6, 2026

View reviewed changes

puyuan1996 added 10 commits January 7, 2026 10:57

fix(pu): fix import bug

d073780

polish(pu): polish comments, docstring, some little redundancy

2a2c794

polish(pu): optimize import orders

92fb126

refactor(pu): move some reusable common var. and safe_eval() method t…

a7d65a6

…o lzero/entry/utils.py

fix(pu): fix Optional import bug

62a5102

fix(pu): fix prediction network

11d0e85

fix(pu): add brew install swig in test.yml

10b185e

fix(pu): fix import bug in test

12cab6f

fix(pu): fix type lint bug

aa8b293

fix(pu): fix import bug in test

9e3cd2a

PaParaZz1 approved these changes Jan 8, 2026

View reviewed changes

puyuan1996 added 7 commits January 8, 2026 18:10

fix(pu): fix import bug in test

a2a7205

fix(pu): fix test

1bf1b0c

fix(pu): fix some args bug

9c195f1

polish(pu): add some comments and little polish

5a1765f

fix(pu): fix 2 tests

b0a69b6

fix(pu): fix not_enough_data ddp bug

7841fdf

fix(pu): fix final_norm_option and predict_latent_loss_type default c…

ad2226a

…onfig bug

puyuan1996 merged commit 81db0b2 into main Jan 8, 2026
1 of 6 checks passed

puyuan1996 mentioned this pull request Jan 8, 2026

fix(pu): fix longrun performance of muzero in mspacman and qbert #400

Closed

puyuan1996 added the refactor Cleanup, formatting, or restructuring of existing code. label Jan 8, 2026

puyuan1996 mentioned this pull request Jan 15, 2026

feature(pu): add atari/dmc multitask and balance pipeline in ScaleZero paper #417

Closed

Conversation

puyuan1996 commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PaParaZz1 Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

puyuan1996 Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

PaParaZz1 Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

puyuan1996 Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

PaParaZz1 Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

puyuan1996 commented Dec 3, 2025 •

edited

Loading