Open
Description
Recently there have been papers related to policy collapse and loss of plasticity in Reinforcement Learning suggesting that the default values for the Adam betas in PyTorch (b1=0.9, b2=0.999) are not ideal and pretty much arbitrary, and I noticed that this is the case here also.
This paper suggests using b1=b2 for better results.
This is more of a discussion than an issue tho, my testing seems to agree with the paper (for reference I used b1=b2=0.9), both using my own env and using gym envs like the cartpole problem. I do not know however how relevant this is outside of RL.
Metadata
Assignees
Labels
No labels