Open
Description
Describe the bug
The observation and info returned at the last step in PointMaze with continuing_task=True
, aren't updated (i.e. they contain the old goal). This is not the intended general semantics: in a common RL loop, the agent will use the old observation to predict the action to go to the old goal, instead of the new one.
See related issue: Farama-Foundation/Minari#265
See:
Gymnasium-Robotics/gymnasium_robotics/envs/maze/point_maze.py
Lines 392 to 406 in 3719d9d
Code example
You need an expert policy to see this; check https://github.com/Farama-Foundation/minari-dataset-generation-scripts/blob/main/scripts/pointmaze/create_pointmaze_dataset.py
Metadata
Metadata
Assignees
Labels
No labels