Skip to content

[Bug Report] Obs and info semantics in PointMaze with continuing_task #258

Open
@younik

Description

@younik

Describe the bug
The observation and info returned at the last step in PointMaze with continuing_task=True, aren't updated (i.e. they contain the old goal). This is not the intended general semantics: in a common RL loop, the agent will use the old observation to predict the action to go to the old goal, instead of the new one.

See related issue: Farama-Foundation/Minari#265
See:

def step(self, action):
obs, _, _, _, info = self.point_env.step(action)
obs_dict = self._get_obs(obs)
reward = self.compute_reward(obs_dict["achieved_goal"], self.goal, info)
terminated = self.compute_terminated(obs_dict["achieved_goal"], self.goal, info)
truncated = self.compute_truncated(obs_dict["achieved_goal"], self.goal, info)
info["success"] = bool(
np.linalg.norm(obs_dict["achieved_goal"] - self.goal) <= 0.45
)
# Update the goal position if necessary
self.update_goal(obs_dict["achieved_goal"])
return obs_dict, reward, terminated, truncated, info

Code example
You need an expert policy to see this; check https://github.com/Farama-Foundation/minari-dataset-generation-scripts/blob/main/scripts/pointmaze/create_pointmaze_dataset.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions