Continue on the previous question.

        Hi thanks for the questions! It's reasonable that the end steps for all episodes is 25 (I believe the max number of steps is set to 25 by default, and it can remain 25 even if you enable early stopping when the goal is achieved). As for the difference between `test_reward` and `test_bench/step_reward`, it's due to two major differences. First, the reward and benchmark loggers log things a little bit differently: (as far as I remember from my notes) the reward logger resets at the end of each episode whereas the benchmark logger resets only once at the collector's init(), so the trends can be different. Second, the `test_bench/step_reward` additionally divides the episode reward by the number of steps in each episode (i.e. avg reward per step). Please check the code for the reward and benchmark logger as well as `offpolicy_trainer` for your own understanding, and feel free to write your own logger for your purposes! Lmk if you have any other questions, thanks!

_Originally posted by @zixianma in https://github.com/StanfordVL/alignment/issues/3#issuecomment-1366630091_
      
Thanks for your reply to the previous question. I follow your reminder and check the code for `SimpleSpreadBenchmarkLogger` and find the code that might be the key to the difference between these two metrics (i.e., `test_reward` and `test_bench/step_reward`). Here is the code:
https://github.com/StanfordVL/alignment/blob/58754e4579131c2e2104a3bfd7d9fe13ced0b1d4/map/tianshou/env/utils.py#L232

Here you only add the info of the first agent (i.e., elem['n'][0]). However, for the default setting, there are 5 agents and the length of elem['n'] is 5. Moreover, each element in elem['n'] has different info for different agents, so the rewards can be different. This phenomenon does not occur in the computation of `test_reward`, so their trends are different. Could you help me check out if my understanding is correct? Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continue on the previous question. #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Continue on the previous question. #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions