Skip to content

Add a config option to clamp actor network output #59

Open
@reeceomahoney

Description

@reeceomahoney

When using the OnPolicyRunner for some envs and reward functions, the agent can often get stuck in a feedback loop where the value of the action explodes causing training to crash. Could be easily fixed by adding a torch.clamp in the update_distribution function of ActorCritic, along with a config option to set whether this is active or not. Would be happy to submit a pull request if you think this is a good idea.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions