Skip to content

[Question] PBT config objective definition #3638

@elvisbreit

Description

@elvisbreit

Question

I want to use the newly added HPO method Population Based Training for my task and set the objective to the total amount of rewards over time for one episode, which is logged in rl_games with the metrics rewards/time as can be seen here in the example:

parser.add_argument("--metric", type=str, default="rewards/time", help="What metric to tune for.")

I have changed the objective parameter which is located in the following part of the config:

pbt:
enabled: False
policy_idx: 0 # policy index in a population
num_policies: 8 # total number of policies in the population
directory: .
workspace: "pbt_workspace" # suffix of the workspace dir name inside train_dir
objective: episode.Curriculum/adr

I do get the following error though if I start the training with objective: rewards/time:

Error executing job with overrides: ['agent.pbt.enabled=True', 'agent.pbt.num_policies=4', 'agent.pbt.policy_idx=0']
Traceback (most recent call last):
  File "/workspace/isaaclab/source/isaaclab_tasks/isaaclab_tasks/utils/hydra.py", line 101, in hydra_main
    func(env_cfg, agent_cfg, *args, **kwargs)
  File "/workspace/isaaclab/scripts/reinforcement_learning/rl_games/train.py", line 239, in main
    runner.run({"train": True, "play": False, "sigma": train_sigma})
  File "/workspace/isaaclab/_isaac_sim/kit/python/lib/python3.11/site-packages/rl_games/torch_runner.py", line 178, in run
    self.run_train(args)
  File "/workspace/isaaclab/_isaac_sim/kit/python/lib/python3.11/site-packages/rl_games/torch_runner.py", line 149, in run_train
    agent.train()
  File "/workspace/isaaclab/_isaac_sim/kit/python/lib/python3.11/site-packages/rl_games/common/a2c_common.py", line 1351, in train
    step_time, play_time, update_time, sum_time, a_losses, c_losses, b_losses, entropies, kls, last_lr, lr_mul = self.train_epoch()
                                                                                                                 ^^^^^^^^^^^^^^^^^^
  File "/workspace/isaaclab/_isaac_sim/kit/python/lib/python3.11/site-packages/rl_games/common/a2c_common.py", line 1207, in train_epoch
    batch_dict = self.play_steps()
                 ^^^^^^^^^^^^^^^^^
  File "/workspace/isaaclab/_isaac_sim/kit/python/lib/python3.11/site-packages/rl_games/common/a2c_common.py", line 792, in play_steps
    self.algo_observer.process_infos(infos, env_done_indices)
  File "/workspace/isaaclab/source/isaaclab_rl/isaaclab_rl/rl_games/pbt/pbt.py", line 259, in process_infos
    self._call_multi("process_infos", infos, done_indices)
  File "/workspace/isaaclab/source/isaaclab_rl/isaaclab_rl/rl_games/pbt/pbt.py", line 250, in _call_multi
    getattr(o, method)(*args_, **kwargs_)
  File "/workspace/isaaclab/source/isaaclab_rl/isaaclab_rl/rl_games/pbt/pbt.py", line 75, in process_infos
    score = score[part]
            ~~~~~^^^^^^
KeyError: 'rewards/time'

How to set the objective to the corresponding metrics of rewards/time and how in general can I access other terms besides episode.Curriculum/adr from the example config?

@ooctipus, maybe you can help out, since I have seen that you have made the commit for this functionality.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions