[Question] Maze Dense Reward

### Question

Looking at the dense reward function for Maze Env:

return np.exp(-np.linalg.norm(desired_goal - achieved_goal))

The agent seems to prefer sitting the ball as close as possible to the goal without touching it after optimisation. 

This makes sense given there is no bonus for reaching the reward and the reward is positive for all time steps.

Why is the dense reward formulated this way?




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Question] Maze Dense Reward #175

Question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Question] Maze Dense Reward #175

Description

Question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions