Open
Description
When using the OnPolicyRunner for some envs and reward functions, the agent can often get stuck in a feedback loop where the value of the action explodes causing training to crash. Could be easily fixed by adding a torch.clamp in the update_distribution function of ActorCritic, along with a config option to set whether this is active or not. Would be happy to submit a pull request if you think this is a good idea.
Metadata
Metadata
Assignees
Labels
No labels