Which LR scheduler to use ? Linear warmup + cosine schedule? Which LR to use?
Which LR scheduler to use ? Linear warmup + cosine schedule?
Which LR to use?