| Name | Venue | Year | Paper | URL |
|---|---|---|---|---|
| Adam | Adam: A Method for Stochastic Optimization | Arxiv | ||
| AdamW | Decoupled Weight Decay Regularization | Arxiv | ||
| AdamWnanoGPT | ||||
| PlainRAdam | ||||
| RAdam | On the Variance of the Adaptive Learning Rate and Beyond | Arxiv - REF | ||
| SGD | ||||
| TS_Adam | arXiv | 2025 | Rethinking Adam for Time Series Forecasting: A Simple Heuristic to Improve Optimization under Distribution Shifts | Arxiv - GitHub |
| TS_AdamW | arXiv | 2025 | Rethinking Adam for Time Series Forecasting: A Simple Heuristic to Improve Optimization under Distribution Shifts | Arxiv - GitHub |
| TS_Yogi | arXiv | 2025 | Rethinking Adam for Time Series Forecasting: A Simple Heuristic to Improve Optimization under Distribution Shifts | Arxiv - GitHub |