Skip to content

可能的性能优化: #460

@mycve

Description

@mycve
Collecting samples from '/data/LightZero/.venv/bin/python3 -u zoo/board_games/chinesechess/config/cchess_efficientzero_config.py' (python v3.11.14)
Total Samples 27500
GIL: 62.00%, Active: 100.00%, Threads: 9

  %Own   %Total  OwnTime  TotalTime  Function (filename)                                                                                                                                                                                                                      
 54.00% 100.00%   115.5s    162.4s   search (lzero/mcts/tree_search/mcts_ctree.py)
  0.00% 100.00%   43.09s    227.8s   collect (lzero/worker/muzero_collector.py)
  0.00%   0.00%   17.86s    17.87s   <listcomp> (lzero/policy/efficientzero.py) 在这!!!!!!!!!!!
 13.00%  13.00%   17.18s    17.18s   _conv_forward (torch/nn/modules/conv.py)
 11.00%  11.00%   11.99s    12.24s   batch_norm (torch/nn/functional.py)
  0.00%   0.00%    6.04s    11.17s   _compute_target_reward_value (lzero/mcts/buffer/game_buffer_efficientzero.py)
  0.00%   0.00%    4.84s     4.84s   <listcomp> (lzero/mcts/buffer/game_buffer_muzero.py)
  8.00%   8.00%    4.57s     4.57s   relu (torch/nn/functional.py)
  0.00%   0.00%    4.52s     9.89s   _compute_target_policy_non_reanalyzed (lzero/mcts/buffer/game_buffer_muzero.py)
  0.00%   0.00%    3.41s     3.41s   __call__ (lzero/policy/scaling_transform.py)
  0.00%   0.00%    3.23s     3.23s   <listcomp> (lzero/mcts/buffer/game_buffer_efficientzero.py)
  0.00%  28.00%    3.10s    38.58s   forward (ding/torch_utils/network/res_block.py)
  3.00%   3.00%    2.73s     2.80s   forward (torch/nn/modules/linear.py)
  3.00%  14.00%    2.59s    15.86s   forward (torch/nn/modules/batchnorm.py)
  1.00%  45.00%    2.56s    52.98s   _call_impl (torch/nn/modules/module.py)  

使用 numpy 优化:比 Python list comprehension 快 10+ 倍 :legal_actions

legal_actions = [np.nonzero(action_mask[j])[0].tolist() for j in range(active_collect_env_num)]

简单性能测试:

🚀 巨大优化!

原始方法: 55.011 ms

优化方法: 0.106 ms

加速比: 516.7x ✓

516 倍加速!这个 <listcomp> 从 2.07s (12%) 降到几乎可以忽略。

可优化的文件

文件 状态
lzero/policy/efficientzero.py ✅ 可优化
lzero/policy/muzero.py ✅ 可优化
lzero/policy/gumbel_muzero.py ✅ 可优化

优化后调试性能

Collecting samples from '/data/LightZero/.venv/bin/python3 -u zoo/board_games/chinesechess/config/cchess_efficientzero_config.py' (python v3.11.14)
Total Samples 42900
GIL: 91.00%, Active: 100.00%, Threads: 11

  %Own   %Total  OwnTime  TotalTime  Function (filename)                                                                                                                                                                                                                      
 87.00% 100.00%   213.5s    297.0s   search (lzero/mcts/tree_search/mcts_ctree.py)
  0.00% 100.00%   72.25s    381.7s   collect (lzero/worker/muzero_collector.py)
  4.00%   4.00%   27.44s    27.44s   _conv_forward (torch/nn/modules/conv.py)
  2.00%   2.00%   19.51s    19.78s   batch_norm (torch/nn/functional.py)
  1.00%   1.00%    6.47s     6.47s   relu (torch/nn/functional.py)
  1.00%   1.00%    5.45s     5.45s   __call__ (lzero/policy/scaling_transform.py)
  0.00%   0.00%    5.30s     8.38s   _compute_target_reward_value (lzero/mcts/buffer/game_buffer_efficientzero.py)
  0.00%   7.00%    4.86s    60.32s   forward (ding/torch_utils/network/res_block.py)
  2.00%  10.00%    4.28s    84.48s   _call_impl (torch/nn/modules/module.py)
  2.00%   2.00%    4.08s     4.08s   to_detach_cpu_numpy (lzero/policy/utils.py)
  0.00%   0.00%    4.07s     8.49s   _compute_target_policy_non_reanalyzed (lzero/mcts/buffer/game_buffer_muzero.py)
  0.00%   0.00%    4.03s     4.03s   <listcomp> (lzero/mcts/buffer/game_buffer_muzero.py)
  0.00%   0.00%    3.97s     4.03s   forward (torch/nn/modules/linear.py)
  0.00%   0.00%    3.64s     3.81s   <listcomp> (lzero/policy/efficientzero.py)  加速明显!
  0.00%   2.00%    3.38s    25.19s   forward (torch/nn/modules/batchnorm.py)

从17秒降低到3秒。

各个算法的game_buffer同样可能存在这问题。
经过全部优化后,似乎除了search和collect,其他可优化地方很少了,单核性能太差了,CPU多核、以及GPU无法发挥性能

Metadata

Metadata

Assignees

No one assigned

    Labels

    efficiency optimizationEfficiency optimization (time, memory and so on)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions