Sensitivity wrt LR restarts

I'm observing sensitivity wrt LR restarts in a typical SGDR schedule with cosine annealing as in Loschilov & Hutter. RAdam still seems to be doing better than AdamW so far, but the jumps imply possible numerical instability at LR discontinuities.

Here's the training loss compared to AdamW (PyTorch 1.2.0 version):
![radam_jumps](https://user-images.githubusercontent.com/46361887/63208449-0648d600-c089-11e9-8444-f0d2e0453b27.png)

Here's the validation loss:
![radam_val](https://user-images.githubusercontent.com/46361887/63208457-1fea1d80-c089-11e9-85ae-e5d4cb6c6884.png)

What's the recommendation here? Should I use warmup in every cycle rather than just in the beginning? I thought RAdam was supposed to obviate the need for warmup. Is this a bug?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sensitivity wrt LR restarts #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Sensitivity wrt LR restarts #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions