Skip to content

Latest commit

 

History

History
12 lines (12 loc) · 1.48 KB

File metadata and controls

12 lines (12 loc) · 1.48 KB

Optimizers

Name Venue Year Paper URL
Adam Adam: A Method for Stochastic Optimization Arxiv
AdamW Decoupled Weight Decay Regularization Arxiv
AdamWnanoGPT
PlainRAdam
RAdam On the Variance of the Adaptive Learning Rate and Beyond Arxiv - REF
SGD
TS_Adam arXiv 2025 Rethinking Adam for Time Series Forecasting: A Simple Heuristic to Improve Optimization under Distribution Shifts Arxiv - GitHub
TS_AdamW arXiv 2025 Rethinking Adam for Time Series Forecasting: A Simple Heuristic to Improve Optimization under Distribution Shifts Arxiv - GitHub
TS_Yogi arXiv 2025 Rethinking Adam for Time Series Forecasting: A Simple Heuristic to Improve Optimization under Distribution Shifts Arxiv - GitHub