Skip to content

Conversation

@foersterrobert
Copy link

In the paper, the proposed betas of Adam were set to 0.0 & 0.9, so I guess it makes sense to use those as default here as well. I also got more stable results during training using these. But maybe you tweaked the betas on purpose, and the old ones make more sense to you. TY for your implementations btw :).

In the paper, the proposed betas of Adam were set to 0.0 & 0.9, so I guess it makes sense to use those as default here as well. I also got more stable results during training using these. But maybe you tweaked the betas on purpose, and the old ones make more sense to you. TY for your implementations btw :).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant