Skip to content
Merged
Show file tree
Hide file tree
Changes from 85 commits
Commits
Show all changes
102 commits
Select commit Hold shift + click to select a range
da4cdba
feature(pu): add unizero/muzero multitask pipeline and net plasticity…
puyuan1996 Apr 25, 2025
a6eed25
fix(pu): fix some adaptation bug
puyuan1996 Apr 25, 2025
67a0e9a
feature(pu): add unizero multitask balance pipeline for atari and dmc
puyuan1996 Apr 29, 2025
f083096
fix(pu): fix some adaptation bug
puyuan1996 Apr 29, 2025
37eb118
feature(pu): add vit encoder for unizero
puyuan1996 Apr 29, 2025
f32d63e
polish(pu): polish moe layer in transformer
puyuan1996 May 1, 2025
c0aa747
feature(pu): add eval norm mean/medium for atari
puyuan1996 May 5, 2025
8b3cff6
fix(pu): fix atari norm mean/median, fix collect in balance pipeline
puyuan1996 May 7, 2025
f2c158b
polish(pu): polish config
puyuan1996 May 7, 2025
20b42f7
fix(pu): fix dmc multitask to be compatiable with timestep (which is …
puyuan1996 May 7, 2025
39ee55e
polish(pu): polish config
puyuan1996 May 13, 2025
e85c449
fix(pu): fix task_id bug in balance pipeline, and polish benchmark_na…
puyuan1996 May 14, 2025
c16d564
fix(pu): fix benchmark_name option
puyuan1996 May 14, 2025
474b81c
polish(pu): fix norm score computation, adapt config to aliyun
puyuan1996 May 21, 2025
50e367e
polish(pu): polish unizero_mt balance pipeline use CurriculumControll…
puyuan1996 May 23, 2025
9171c3e
tmp
puyuan1996 May 30, 2025
bc5003a
Merge branch 'dev-multitask-balance-clean' of https://github.com/open…
puyuan1996 May 30, 2025
158e4a0
tmp
puyuan1996 Jun 1, 2025
d66b986
tmp
puyuan1996 Jun 4, 2025
0d5ede0
test(pu): add vit moe test
puyuan1996 Jun 5, 2025
ca6ddb6
polish(pu): add adapter_scales to tb
puyuan1996 Jun 11, 2025
7dd6c04
feature(pu): add atari uz balance config
puyuan1996 Jun 12, 2025
c8e7cb8
polish(pu): add stable_adaptor_scale
puyuan1996 Jun 19, 2025
0313335
tmp
puyuan1996 Jun 23, 2025
ef170fd
sync code
puyuan1996 Jun 25, 2025
bbec353
polish(pu): use freeze_non_lora_parameters in transformer, not use Le…
zjowowen Jul 30, 2025
20648d5
feature(pu): add vit-encoder lora in balance pipeline
zjowowen Jul 30, 2025
db6032a
polish(pu): fix reanalyze index bug, fix global_solved bug, add apply…
puyuan1996 Aug 5, 2025
f63b544
polish(pu): add collect/eval_num_simulations option
puyuan1996 Aug 5, 2025
bbbe505
polish(pu): polish comments and style in entry of scalezero
puyuan1996 Sep 28, 2025
bf9f965
polish(pu): polish comments and style of ctree/tree_search/buffer/com…
puyuan1996 Sep 28, 2025
fb04c7a
polish(pu): polish comments and style of files in lzero.model
puyuan1996 Sep 28, 2025
06148e7
polish(pu): polish comments and style of files in lzero.model.unizero…
puyuan1996 Sep 28, 2025
471ae6a
polish(pu): polish comments and style of unizero_world_models
puyuan1996 Sep 28, 2025
07933a5
polish(pu): polish comments and style of files in policy/
puyuan1996 Sep 28, 2025
df3b644
polish(pu): polish comments and style of files in worker
puyuan1996 Sep 28, 2025
4f89dcc
polish(pu): polish comments and style of files in configs
puyuan1996 Sep 28, 2025
e7a8796
Merge remote-tracking branch 'origin/main' into dev-multitask-balance…
puyuan1996 Sep 28, 2025
ab746d1
fix(pu): fix some merge typo
tAnGjIa520 Sep 28, 2025
0476aca
fix(pu): fix ln norm_type, fix kv_cache rewrite bug, add value_priori…
tAnGjIa520 Sep 28, 2025
2c0a965
fix(pu): fix unizero_mt
tAnGjIa520 Sep 28, 2025
84e6094
polish(pu): add LN in head, polish init_weight, polish adamw
tAnGjIa520 Sep 29, 2025
05da638
fix(pu): fix configure_optimizer_unizero in unizero_mt
tAnGjIa520 Oct 2, 2025
06ad080
feature(pu): add encoder-clip, label smooth, analyze_latent_represent…
tAnGjIa520 Oct 9, 2025
9f69f5a
feature(pu): add encoder-clip, label smooth option in unizero_multit…
tAnGjIa520 Oct 9, 2025
af99278
fix(pu): fix tb log when gpu_num<task_num, fix total_loss += bug, polish
tAnGjIa520 Oct 9, 2025
bf91ca2
polish(pu):polish config
tAnGjIa520 Oct 9, 2025
b18f892
fix(pu): fix encoder-clip bug and num_channel/res bug
tAnGjIa520 Oct 11, 2025
bf3cd12
polish(pu): polish scale_factor in DPS
tAnGjIa520 Oct 12, 2025
b1efa60
tmp
tAnGjIa520 Oct 18, 2025
c2f9817
feature(pu): add some analysis metrics in tensorboard for unizero and…
tAnGjIa520 Oct 23, 2025
b081379
polish(pu): abstract a KVCacheManager for world model
tAnGjIa520 Oct 23, 2025
2eff68d
tmp
puyuan1996 Oct 23, 2025
27075c1
polish(pu): polish unizero obs_loss to cos_sim loss
puyuan1996 Oct 23, 2025
b4c3ba8
tmp
puyuan1996 Oct 24, 2025
3788eb7
polish(pu): polish minotor-log and adapt to ale/xxx-v5 style game
puyuan1996 Oct 25, 2025
6d7761a
feature(pu): add decode_loss for unizero atari
puyuan1996 Oct 25, 2025
a7ed590
test(pu): test unizero-mt
puyuan1996 Oct 25, 2025
be07791
fix(pu): fix Deep Copy Before Storag bug when Use KVCacheManager
puyuan1996 Oct 28, 2025
74ff3d6
sync code
puyuan1996 Oct 31, 2025
aefa082
feature(pu): add iter_policy_evaluation demo in grid-world
puyuan1996 Nov 4, 2025
a8be15e
Merge branch 'dev-multitask-balance-clean-kvcachemanager' of https://…
puyuan1996 Nov 4, 2025
08f3a29
polish(pu): polish atari uz config
puyuan1996 Nov 5, 2025
16ca8d4
polish(pu): polish policy logits stability
puyuan1996 Nov 13, 2025
5cff9eb
sync code
puyuan1996 Nov 17, 2025
3c820ef
polish(pu): polish policy logits stability
Nov 17, 2025
b9b8d26
Merge branch 'dev-multitask-balance-clean-kvcachemanager' of https://…
Nov 17, 2025
bd67cdf
fix(pu): fix exp_name and task_id bug in dmc pipeline, fix some configs
Nov 20, 2025
39a9c8c
feature(pu): add head-clip manager
puyuan1996 Dec 2, 2025
32d7f36
fix(pu): fix head-clip log
puyuan1996 Dec 2, 2025
f8bd43e
tmp
puyuan1996 Dec 5, 2025
72e4b6d
Merge remote-tracking branch 'origin/main' into dev-multitask-balance…
puyuan1996 Dec 24, 2025
21317f6
polish(pu): polish comments and code styles
puyuan1996 Dec 24, 2025
af6f016
polish(pu): polish comments and code styles in entry/mcts/model
puyuan1996 Dec 25, 2025
6190c08
polish(pu): polish comments and code styles in policy/config
puyuan1996 Dec 26, 2025
f723e41
polish(pu): polish comments and code styles in config
puyuan1996 Dec 26, 2025
256edf0
polish(pu): polish comments and code styles in atari env
puyuan1996 Dec 26, 2025
cb5ae6b
fix(pu): fix comments of worker in ddp mode, fix device bug in evalua…
puyuan1996 Dec 26, 2025
5ed77bf
fix(pu): fix unizero_multitask ddp barrier bug
puyuan1996 Dec 26, 2025
7cf1e2d
fix(pu): add policy_logits_clip_method option
puyuan1996 Dec 29, 2025
50db85f
fix(pu): add policy_logits_clip_method option
puyuan1996 Dec 29, 2025
fefd62b
polish(pu): polish comments, docstring, readme
puyuan1996 Jan 5, 2026
377dc97
polish(pu): polish atari unizero configs and default configs in unize…
puyuan1996 Jan 5, 2026
0d6049a
polish(pu): update to macos-15
puyuan1996 Jan 5, 2026
116d10a
fix(pu): fix gymnasium[atari] version
puyuan1996 Jan 5, 2026
d073780
fix(pu): fix import bug
puyuan1996 Jan 7, 2026
2a2c794
polish(pu): polish comments, docstring, some little redundancy
puyuan1996 Jan 7, 2026
92fb126
polish(pu): optimize import orders
puyuan1996 Jan 7, 2026
a7d65a6
refactor(pu): move some reusable common var. and safe_eval() method t…
puyuan1996 Jan 7, 2026
62a5102
fix(pu): fix Optional import bug
puyuan1996 Jan 7, 2026
11d0e85
fix(pu): fix prediction network
puyuan1996 Jan 7, 2026
10b185e
fix(pu): add brew install swig in test.yml
puyuan1996 Jan 7, 2026
12cab6f
fix(pu): fix import bug in test
puyuan1996 Jan 7, 2026
aa8b293
fix(pu): fix type lint bug
puyuan1996 Jan 7, 2026
9e3cd2a
fix(pu): fix import bug in test
puyuan1996 Jan 8, 2026
a2a7205
fix(pu): fix import bug in test
puyuan1996 Jan 8, 2026
1bf1b0c
fix(pu): fix test
puyuan1996 Jan 8, 2026
9c195f1
fix(pu): fix some args bug
puyuan1996 Jan 8, 2026
5a1765f
polish(pu): add some comments and little polish
puyuan1996 Jan 8, 2026
b0a69b6
fix(pu): fix 2 tests
puyuan1996 Jan 8, 2026
7841fdf
fix(pu): fix not_enough_data ddp bug
puyuan1996 Jan 8, 2026
ad2226a
fix(pu): fix final_norm_option and predict_latent_loss_type default c…
puyuan1996 Jan 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 12 additions & 12 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ jobs:
matrix:
os:
- 'ubuntu-20.04'
- 'macos-13'
- 'macos-15'
python:
- '3.7'
- '3.8'
Expand All @@ -73,11 +73,11 @@ jobs:
architecture: x86
- os: ubuntu-20.04
architecture: AMD64
- os: macos-13
- os: macos-15
architecture: aarch64
- os: macos-13
- os: macos-15
architecture: x86
- os: macos-13
- os: macos-15
architecture: AMD64

steps:
Expand Down Expand Up @@ -167,25 +167,25 @@ jobs:
name: build-artifacts-wheels-ubuntu-20.04-3.11-aarch64
path: aggregated_wheels_all

- name: Download wheel macos-13, 3.7, x86_64
- name: Download wheel macos-15, 3.7, x86_64
uses: actions/download-artifact@v4
with:
name: build-artifacts-wheels-macos-13-3.7-x86_64
name: build-artifacts-wheels-macos-15-3.7-x86_64
path: aggregated_wheels_all
- name: Download wheel macos-13, 3.8, x86_64
- name: Download wheel macos-15, 3.8, x86_64
uses: actions/download-artifact@v4
with:
name: build-artifacts-wheels-macos-13-3.8-x86_64
name: build-artifacts-wheels-macos-15-3.8-x86_64
path: aggregated_wheels_all
- name: Download wheel macos-13, 3.7, arm64
- name: Download wheel macos-15, 3.7, arm64
uses: actions/download-artifact@v4
with:
name: build-artifacts-wheels-macos-13-3.7-arm64
name: build-artifacts-wheels-macos-15-3.7-arm64
path: aggregated_wheels_all
- name: Download wheel macos-13, 3.8, arm64
- name: Download wheel macos-15, 3.8, arm64
uses: actions/download-artifact@v4
with:
name: build-artifacts-wheels-macos-13-3.8-arm64
name: build-artifacts-wheels-macos-15-3.8-arm64
path: aggregated_wheels_all

- name: Upload unified wheels artifact
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/release_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ jobs:
matrix:
os:
- 'ubuntu-20.04'
- 'macos-13'
- 'macos-15'
python:
- '3.7.17'
- '3.8.17'
Expand All @@ -76,11 +76,11 @@ jobs:
architecture: x86
- os: ubuntu-20.04
architecture: AMD64
- os: macos-13
- os: macos-15
architecture: aarch64
- os: macos-13
- os: macos-15
architecture: x86
- os: macos-13
- os: macos-15
architecture: AMD64
- python: '3.7.17'
architecture: arm64
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ jobs:
matrix:
os:
- 'self-hosted'
- 'macos-13'
- 'macos-15'
python-version:
- '3.8'
- '3.9'
Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -1453,4 +1453,4 @@ events.*
!/assets/pooltool/**
lzero/mcts/ctree/ctree_alphazero/pybind11

zoo/jericho/envs/z-machine-games-master
zoo/jericho/envs/z-machine-games-master
193 changes: 193 additions & 0 deletions lzero/entry/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
# LightZero Entry Functions

English | [中文](./README_zh.md)

This directory contains the training and evaluation entry functions for various algorithms in the LightZero framework. These entry functions serve as the main interfaces for launching different types of reinforcement learning experiments.

## 📁 Directory Structure

### 🎯 Training Entries

#### AlphaZero Family
- **`train_alphazero.py`** - Training entry for AlphaZero algorithm
- Suitable for perfect information board games (e.g., Go, Chess)
- No environment model needed, learns through self-play
- Uses Monte Carlo Tree Search (MCTS) for policy improvement

#### MuZero Family
- **`train_muzero.py`** - Standard training entry for MuZero algorithm
- Supports MuZero, EfficientZero, Sampled EfficientZero, Gumbel MuZero variants
- Learns an implicit model of the environment (dynamics model)
- Suitable for single-task reinforcement learning scenarios

- **`train_muzero_segment.py`** - MuZero training with segment collector and buffer reanalyze
- Uses `MuZeroSegmentCollector` for data collection
- Supports buffer reanalyze trick for improved sample efficiency
- Supported algorithms: MuZero, EfficientZero, Sampled MuZero, Sampled EfficientZero, Gumbel MuZero, StochasticMuZero

- **`train_muzero_with_gym_env.py`** - MuZero training adapted for Gym environments
- Specifically designed for OpenAI Gym-style environments
- Simplifies environment interface adaptation

- **`train_muzero_with_reward_model.py`** - MuZero training with reward model
- Integrates external Reward Model
- Suitable for scenarios requiring learning complex reward functions

- **`train_muzero_multitask_segment_ddp.py`** - MuZero multi-task distributed training
- Supports multi-task learning
- Uses DDP (Distributed Data Parallel) for distributed training
- Uses Segment Collector

#### UniZero Family
- **`train_unizero.py`** - Training entry for UniZero algorithm
- Based on paper "UniZero: Generalized and Efficient Planning with Scalable Latent World Models"
- Enhanced planning capabilities for better long-term dependency capture
- Uses scalable latent world models
- Paper: https://arxiv.org/abs/2406.10667

- **`train_unizero_segment.py`** - UniZero training with segment collector
- Uses `MuZeroSegmentCollector` for efficient data collection
- Supports buffer reanalyze trick

- **`train_unizero_multitask_segment_ddp.py`** - UniZero multi-task distributed training
- Supports multi-task learning and distributed training
- Includes benchmark score definitions (e.g., Atari human-normalized scores)
- Supports curriculum learning strategies
- Uses DDP for training acceleration

- **`train_unizero_multitask_balance_segment_ddp.py`** - UniZero balanced multi-task distributed training
- Implements balanced sampling across tasks in multi-task training
- Dynamically adjusts batch sizes for different tasks
- Suitable for scenarios with large task difficulty variations

- **`train_unizero_multitask_segment_eval.py`** - UniZero multi-task evaluation training
- Specialized for training and periodic evaluation in multi-task scenarios
- Includes detailed evaluation metric statistics

- **`train_unizero_with_loss_landscape.py`** - UniZero training with loss landscape visualization
- For training with loss landscape visualization
- Helps understand model optimization process and generalization performance
- Integrates `loss_landscapes` library

#### ReZero Family
- **`train_rezero.py`** - Training entry for ReZero algorithm
- Supports ReZero-MuZero and ReZero-EfficientZero
- Improves training stability through residual connections
- Paper: https://arxiv.org/pdf/2404.16364

### 🎓 Evaluation Entries

- **`eval_alphazero.py`** - Evaluation entry for AlphaZero
- Loads trained AlphaZero models for evaluation
- Can play against other agents for performance testing

- **`eval_muzero.py`** - Evaluation entry for MuZero family
- Supports evaluation of all MuZero variants
- Provides detailed performance statistics

- **`eval_muzero_with_gym_env.py`** - MuZero evaluation for Gym environments
- Specialized for evaluating models trained in Gym environments

### 🛠️ Utility Modules

- **`utils.py`** - Common utility functions library
- **Math & Tensor Utilities**:
- `symlog`, `inv_symlog` - Symmetric logarithm transformations
- `initialize_zeros_batch`, `initialize_pad_batch` - Batch initialization

- **LoRA Utilities**:
- `freeze_non_lora_parameters` - Freeze non-LoRA parameters

- **Task & Curriculum Learning Utilities**:
- `compute_task_weights` - Compute task weights
- `TemperatureScheduler` - Temperature scheduler
- `tasks_per_stage` - Calculate tasks per stage
- `compute_unizero_mt_normalized_stats` - Compute normalized statistics
- `allocate_batch_size` - Dynamically allocate batch sizes

- **Distributed Training Utilities (DDP)**:
- `is_ddp_enabled` - Check if DDP is enabled
- `ddp_synchronize` - DDP synchronization
- `ddp_all_reduce_sum` - DDP all-reduce sum

- **RL Workflow Utilities**:
- `calculate_update_per_collect` - Calculate updates per collection
- `random_collect` - Random policy data collection
- `convert_to_batch_for_unizero` - UniZero batch data conversion
- `create_unizero_loss_metrics` - Create loss metrics function
- `UniZeroDataLoader` - UniZero data loader

- **Logging Utilities**:
- `log_module_trainable_status` - Log module trainable status
- `log_param_statistics` - Log parameter statistics
- `log_buffer_memory_usage` - Log buffer memory usage
- `log_buffer_run_time` - Log buffer runtime

- **`__init__.py`** - Package initialization file
- Exports all training and evaluation entry functions
- Exports commonly used functions from utility modules

## 📖 Usage Guide

### Basic Usage Pattern

All training entry functions follow a similar calling pattern:

```python
from lzero.entry import train_muzero

# Prepare configuration
cfg = dict(...) # User configuration
create_cfg = dict(...) # Creation configuration

# Start training
policy = train_muzero(
input_cfg=(cfg, create_cfg),
seed=0,
model=None, # Optional: pre-initialized model
model_path=None, # Optional: pretrained model path
max_train_iter=int(1e10), # Maximum training iterations
max_env_step=int(1e10), # Maximum environment steps
)
```

### Choosing the Right Entry Function

1. **Single-Task Learning**:
- Board games → `train_alphazero`
- General RL tasks → `train_muzero` or `train_unizero`
- Gym environments → `train_muzero_with_gym_env`

2. **Multi-Task Learning**:
- Standard multi-task → `train_unizero_multitask_segment_ddp`
- Balanced task sampling → `train_unizero_multitask_balance_segment_ddp`

3. **Distributed Training**:
- All entry functions with `_ddp` suffix support distributed training

4. **Special Requirements**:
- Loss landscape visualization → `train_unizero_with_loss_landscape`
- External reward model → `train_muzero_with_reward_model`
- Improved training stability → `train_rezero`

## 🔗 Related Resources

- **AlphaZero**: [Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm](https://arxiv.org/abs/1712.01815)
- **MuZero**: [Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model](https://arxiv.org/abs/1911.08265)
- **EfficientZero**: [Mastering Atari Games with Limited Data](https://arxiv.org/abs/2111.00210)
- **UniZero**: [Generalized and Efficient Planning with Scalable Latent World Models](https://arxiv.org/abs/2406.10667)
- **ReZero**: [Boosting MCTS-based Algorithms by Reconstructing the Terminal Reward](https://arxiv.org/abs/2404.16364)

## 💡 Tips

- Recommended to start with standard `train_muzero` or `train_unizero`
- For large-scale experiments, consider using DDP versions for faster training
- Using `_segment` versions can achieve better sample efficiency (via reanalyze trick)
- Check configuration examples in `zoo/` directory to learn how to set up each algorithm

## 📝 Notes

1. All path parameters should use **absolute paths**
2. Pretrained model paths typically follow format: `exp_name/ckpt/ckpt_best.pth.tar`
3. When using distributed training, ensure `CUDA_VISIBLE_DEVICES` environment variable is set correctly
4. Some entry functions have specific algorithm type requirements - check function documentation
Loading
Loading