-
Notifications
You must be signed in to change notification settings - Fork 185
Open
Labels
efficiency optimizationEfficiency optimization (time, memory and so on)Efficiency optimization (time, memory and so on)
Description
Collecting samples from '/data/LightZero/.venv/bin/python3 -u zoo/board_games/chinesechess/config/cchess_efficientzero_config.py' (python v3.11.14)
Total Samples 27500
GIL: 62.00%, Active: 100.00%, Threads: 9
%Own %Total OwnTime TotalTime Function (filename)
54.00% 100.00% 115.5s 162.4s search (lzero/mcts/tree_search/mcts_ctree.py)
0.00% 100.00% 43.09s 227.8s collect (lzero/worker/muzero_collector.py)
0.00% 0.00% 17.86s 17.87s <listcomp> (lzero/policy/efficientzero.py) 在这!!!!!!!!!!!
13.00% 13.00% 17.18s 17.18s _conv_forward (torch/nn/modules/conv.py)
11.00% 11.00% 11.99s 12.24s batch_norm (torch/nn/functional.py)
0.00% 0.00% 6.04s 11.17s _compute_target_reward_value (lzero/mcts/buffer/game_buffer_efficientzero.py)
0.00% 0.00% 4.84s 4.84s <listcomp> (lzero/mcts/buffer/game_buffer_muzero.py)
8.00% 8.00% 4.57s 4.57s relu (torch/nn/functional.py)
0.00% 0.00% 4.52s 9.89s _compute_target_policy_non_reanalyzed (lzero/mcts/buffer/game_buffer_muzero.py)
0.00% 0.00% 3.41s 3.41s __call__ (lzero/policy/scaling_transform.py)
0.00% 0.00% 3.23s 3.23s <listcomp> (lzero/mcts/buffer/game_buffer_efficientzero.py)
0.00% 28.00% 3.10s 38.58s forward (ding/torch_utils/network/res_block.py)
3.00% 3.00% 2.73s 2.80s forward (torch/nn/modules/linear.py)
3.00% 14.00% 2.59s 15.86s forward (torch/nn/modules/batchnorm.py)
1.00% 45.00% 2.56s 52.98s _call_impl (torch/nn/modules/module.py)
使用 numpy 优化:比 Python list comprehension 快 10+ 倍 :legal_actions
legal_actions = [np.nonzero(action_mask[j])[0].tolist() for j in range(active_collect_env_num)]
简单性能测试:
🚀 巨大优化!
原始方法: 55.011 ms
优化方法: 0.106 ms
加速比: 516.7x ✓
516 倍加速!这个 <listcomp> 从 2.07s (12%) 降到几乎可以忽略。
可优化的文件
| 文件 | 状态 |
|---|---|
| lzero/policy/efficientzero.py | ✅ 可优化 |
| lzero/policy/muzero.py | ✅ 可优化 |
| lzero/policy/gumbel_muzero.py | ✅ 可优化 |
优化后调试性能
Collecting samples from '/data/LightZero/.venv/bin/python3 -u zoo/board_games/chinesechess/config/cchess_efficientzero_config.py' (python v3.11.14)
Total Samples 42900
GIL: 91.00%, Active: 100.00%, Threads: 11
%Own %Total OwnTime TotalTime Function (filename)
87.00% 100.00% 213.5s 297.0s search (lzero/mcts/tree_search/mcts_ctree.py)
0.00% 100.00% 72.25s 381.7s collect (lzero/worker/muzero_collector.py)
4.00% 4.00% 27.44s 27.44s _conv_forward (torch/nn/modules/conv.py)
2.00% 2.00% 19.51s 19.78s batch_norm (torch/nn/functional.py)
1.00% 1.00% 6.47s 6.47s relu (torch/nn/functional.py)
1.00% 1.00% 5.45s 5.45s __call__ (lzero/policy/scaling_transform.py)
0.00% 0.00% 5.30s 8.38s _compute_target_reward_value (lzero/mcts/buffer/game_buffer_efficientzero.py)
0.00% 7.00% 4.86s 60.32s forward (ding/torch_utils/network/res_block.py)
2.00% 10.00% 4.28s 84.48s _call_impl (torch/nn/modules/module.py)
2.00% 2.00% 4.08s 4.08s to_detach_cpu_numpy (lzero/policy/utils.py)
0.00% 0.00% 4.07s 8.49s _compute_target_policy_non_reanalyzed (lzero/mcts/buffer/game_buffer_muzero.py)
0.00% 0.00% 4.03s 4.03s <listcomp> (lzero/mcts/buffer/game_buffer_muzero.py)
0.00% 0.00% 3.97s 4.03s forward (torch/nn/modules/linear.py)
0.00% 0.00% 3.64s 3.81s <listcomp> (lzero/policy/efficientzero.py) 加速明显!
0.00% 2.00% 3.38s 25.19s forward (torch/nn/modules/batchnorm.py)
从17秒降低到3秒。
各个算法的game_buffer同样可能存在这问题。
经过全部优化后,似乎除了search和collect,其他可优化地方很少了,单核性能太差了,CPU多核、以及GPU无法发挥性能
Metadata
Metadata
Assignees
Labels
efficiency optimizationEfficiency optimization (time, memory and so on)Efficiency optimization (time, memory and so on)