Skip to content

Commit 85618b9

Browse files
committed
v0.2.0
1 parent 3e8c2bb commit 85618b9

File tree

10 files changed

+53
-14
lines changed

10 files changed

+53
-14
lines changed

CHANGELOG.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,41 @@
1+
2025.04.01 (v0.2.0)
2+
- env: Add Metadrive environment and configurations (#192)
3+
- env: Add Sampled MuZero/UniZero and DMC environment with related configurations (#260)
4+
- env: Polish Chess environment and its render method; add unittests and configurations (#272)
5+
- env: Add Jericho environment and its configurations (#307)
6+
- algo: Add Harmony Dream loss balance in MuZero (#242)
7+
- algo: Adopt AlphaZero for non-zero-sum games (#245)
8+
- algo: Add AlphaZero CTree unittest (#306)
9+
- algo: Add recent MCTS-related papers (#324)
10+
- algo: Introduce rope to use true timestep index as pos_index (#266)
11+
- algo: Add Jericho DDP configuration (#337)
12+
- feat: Add LightZero Sphinx documentation (#237)
13+
- feat: Add Wandb support (#294)
14+
- feat: Add Atari100k metric utilities (#295)
15+
- feat: Add eval_benchmark tests (#296)
16+
- feat: Add save_replay and collect_episode_data options in Jericho (#333)
17+
- feat: Add an MCTS TicTacToe demo in one single file (#315)
18+
- fix: Fix DownSample for different observation shapes (#254)
19+
- fix: Fix wrong chance values in Stochastic MuZero (#275)
20+
- fix: Use display_frames_as_gif in CartPole (#288)
21+
- fix: Fix chance encoder in stochastic_muzero_model_mlp.py (#284)
22+
- fix: Correct typo in model/utils.py (#290)
23+
- fix: Fix SMZ compile_args and num_simulations bug in world_model (#297)
24+
- fix: Fix reward type bug in 2048 and OS import issue in CartPole (#304)
25+
- fix: Switch to macos-13 in action (#319)
26+
- fix: Fix SMZ & SEZ config for pixel-based DMC (#322)
27+
- fix: Fix update_per_collect in DDP setting (#321)
28+
- fix: Fix obs_shape tuple bug in initialize_zeros_batch (#327)
29+
- fix: Fix prepare_obs_stack_for_unizero (#328)
30+
- fix: Fix random_policy when len(ready_env_id)<collector_env_num (#335)
31+
- fix: Fix timestep compatibility (#339)
32+
- polish: Polish efficiency and performance on Atari and DMC (#292)
33+
- polish: Update requirements (#298)
34+
- polish: Optimize reward/value/policy_head_hidden_channels (#314)
35+
- polish: Update tutorial configuration and log instructions (#330)
36+
- ci: Add self-hosted Linux (Ubuntu) CI runner (#259)
37+
- test: Add self-hosted Linux runner for CI tests (#323)
38+
139
2024.07.12 (v0.1.0)
240
- env: SumToThree env from pooltool(#227)
341
- algo: UniZero (#232)

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828
[![GitHub license](https://img.shields.io/github/license/opendilab/LightZero)](https://github.com/opendilab/LightZero/blob/master/LICENSE)
2929
[![discord badge](https://dcbadge.vercel.app/api/server/dkZS2JF56X?style=flat)](https://discord.gg/dkZS2JF56X)
3030

31-
Updated on 2025.02.08 LightZero-v0.1.0
31+
Updated on 2025.04.01 LightZero-v0.2.0
3232

3333
English | [简体中文(Simplified Chinese)](https://github.com/opendilab/LightZero/blob/main/README.zh.md) | [Documentation](https://opendilab.github.io/LightZero) | [LightZero Paper](https://arxiv.org/abs/2310.08348) | [🔥UniZero Paper](https://arxiv.org/abs/2406.10667) | [🔥ReZero Paper](https://arxiv.org/abs/2404.16364)
3434

README.zh.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
[![Contributors](https://img.shields.io/github/contributors/opendilab/LightZero)](https://github.com/opendilab/LightZero/graphs/contributors)
2828
[![GitHub license](https://img.shields.io/github/license/opendilab/LightZero)](https://github.com/opendilab/LightZero/blob/master/LICENSE)
2929

30-
最近更新于 2025.02.08 LightZero-v0.1.0
30+
最近更新于 2025.04.01 LightZero-v0.2.0
3131

3232
[English](https://github.com/opendilab/LightZero/blob/main/README.md) | 简体中文 | [文档](https://opendilab.github.io/LightZero) | [LightZero 论文](https://arxiv.org/abs/2310.08348) | [🔥UniZero 论文](https://arxiv.org/abs/2406.10667) | [🔥ReZero 论文](https://arxiv.org/abs/2404.16364)
3333

lzero/mcts/buffer/game_segment.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@ def append(
135135
obs: np.ndarray,
136136
reward: np.ndarray,
137137
action_mask: np.ndarray = None,
138-
to_play: List = [-1],
138+
to_play: Union[int, List] = -1,
139139
timestep: int = 0,
140140
chance: int = 0,
141141
) -> None:

lzero/mcts/ptree/ptree_ez.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -239,7 +239,7 @@ def prepare(
239239
noises: List[float],
240240
value_prefixs: List[float],
241241
policies: List[List[float]],
242-
to_play: List = [-1]
242+
to_play: Union[int, List] = -1
243243
) -> None:
244244
"""
245245
Overview:
@@ -261,7 +261,7 @@ def prepare(
261261
self.roots[i].add_exploration_noise(root_noise_weight, noises[i])
262262
self.roots[i].visit_count += 1
263263

264-
def prepare_no_noise(self, value_prefixs: List[float], policies: List[List[float]], to_play: List = [-1]) -> None:
264+
def prepare_no_noise(self, value_prefixs: List[float], policies: List[List[float]], to_play: Union[int, List] = -1) -> None:
265265
"""
266266
Overview:
267267
Expand the roots without noise.

lzero/mcts/ptree/ptree_mz.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -220,7 +220,7 @@ def prepare(
220220
noises: List[float],
221221
rewards: List[float],
222222
policies: List[List[float]],
223-
to_play: List = [-1]
223+
to_play: Union[int, List] = -1
224224
) -> None:
225225
"""
226226
Overview:
@@ -241,7 +241,7 @@ def prepare(
241241
self.roots[i].add_exploration_noise(root_noise_weight, noises[i])
242242
self.roots[i].visit_count += 1
243243

244-
def prepare_no_noise(self, rewards: List[float], policies: List[List[float]], to_play: List = [-1]) -> None:
244+
def prepare_no_noise(self, rewards: List[float], policies: List[List[float]], to_play: Union[int, List] = -1) -> None:
245245
"""
246246
Overview:
247247
Expand the roots without noise.

lzero/mcts/ptree/ptree_sez.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -374,7 +374,7 @@ def prepare(
374374
noises: List[float],
375375
value_prefixs: List[float],
376376
policies: List[List[float]],
377-
to_play: List = [-1]
377+
to_play: Union[int, List] = -1
378378
) -> None:
379379
"""
380380
Overview:
@@ -396,7 +396,7 @@ def prepare(
396396

397397
self.roots[i].visit_count += 1
398398

399-
def prepare_no_noise(self, value_prefixs: List[float], policies: List[List[float]], to_play: List = [-1]) -> None:
399+
def prepare_no_noise(self, value_prefixs: List[float], policies: List[List[float]], to_play: Union[int, List] = -1) -> None:
400400
"""
401401
Overview:
402402
Expand the roots without noise.

lzero/mcts/ptree/ptree_stochastic_mz.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -246,7 +246,7 @@ def prepare(
246246
noises: List[float],
247247
rewards: List[float],
248248
policies: List[List[float]],
249-
to_play: List = [-1]
249+
to_play: Union[int, List] = -1
250250
) -> None:
251251
"""
252252
Overview:
@@ -269,7 +269,7 @@ def prepare(
269269
self.roots[i].add_exploration_noise(root_noise_weight, noises[i])
270270
self.roots[i].visit_count += 1
271271

272-
def prepare_no_noise(self, rewards: List[float], policies: List[List[float]], to_play: List = [-1]) -> None:
272+
def prepare_no_noise(self, rewards: List[float], policies: List[List[float]], to_play: Union[int, List] = -1) -> None:
273273
"""
274274
Overview:
275275
Expand the roots without noise.

lzero/policy/efficientzero.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -666,7 +666,7 @@ def _init_eval(self) -> None:
666666
else:
667667
self._mcts_eval = MCTSPtree(self._cfg)
668668

669-
def _forward_eval(self, data: torch.Tensor, action_mask: list, to_play: List = [-1], ready_env_id: np.array = None, **kwargs):
669+
def _forward_eval(self, data: torch.Tensor, action_mask: list, to_play: Union[int, List] = [-1], ready_env_id: np.array = None, **kwargs):
670670
"""
671671
Overview:
672672
The forward function for evaluating the current policy in eval mode. Use model to execute MCTS search.

zoo/jericho/envs/jericho_env.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -466,11 +466,12 @@ def collect_episode_data(self):
466466
if __name__ == '__main__':
467467
from easydict import EasyDict
468468

469+
env_type='detective' # zork1, acorncourt, detective, omniquest
469470
# Configuration dictionary for the environment.
470471
env_cfg = EasyDict(
471472
dict(
472473
max_steps=400,
473-
game_path="./zoo/jericho/envs/z-machine-games-master/jericho-game-suite/" + "zork1.z5",
474+
game_path="./zoo/jericho/envs/z-machine-games-master/jericho-game-suite/" + f"{env_type}.z5",
474475
max_action_num=10,
475476
tokenizer_path="google-bert/bert-base-uncased",
476477
max_seq_len=512,
@@ -481,7 +482,7 @@ def collect_episode_data(self):
481482
evaluator_env_num=1,
482483
save_replay=True,
483484
save_replay_path=None,
484-
env_type='zork1', # zork1, acorncourt, detective, omniquest
485+
env_type=env_type,
485486
collect_policy_mode='expert' # random, human, expert
486487
)
487488
)

0 commit comments

Comments
 (0)