Add a config option to clamp actor network output

When using the OnPolicyRunner for some envs and reward functions, the agent can often get stuck in a feedback loop where the value of the action explodes causing training to crash. Could be easily fixed by adding a torch.clamp in the update_distribution function of ActorCritic, along with a config option to set whether this is active or not. Would be happy to submit a pull request if you think this is a good idea.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a config option to clamp actor network output #59

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add a config option to clamp actor network output #59

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions