Skip to content

Commit 94f989d

Browse files
committed
Added references
1 parent 56ad53c commit 94f989d

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

site/resource_pages/optimizer_summary.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -20,25 +20,25 @@ This document summarizes the four optimizers compared in the Lesson 30 demo.
2020
## Optimizers overview
2121

2222
### 1. SGD (Stochastic Gradient Descent)
23-
Vanilla gradient descent that updates parameters based on the gradient of the loss function. When `batch_size=1`, it's true stochastic gradient descent; with larger batches, it becomes mini-batch gradient descent. Simple but can be slow to converge and sensitive to learning rate choice.
23+
Vanilla gradient descent that updates parameters based on the gradient of the loss function. When `batch_size=1`, it's true stochastic gradient descent; with larger batches, it becomes mini-batch gradient descent. Simple but can be slow to converge and sensitive to learning rate choice. Citation: [Robbins and Monro, 1951](https://projecteuclid.org/euclid.aoms/1177729586).
2424

2525
### 2. SGD + Momentum
26-
Extends vanilla SGD by accumulating a velocity vector in directions of persistent gradient descent. This helps accelerate convergence in relevant directions and dampens oscillations. A momentum value of 0.9 is commonly used.
26+
Extends vanilla SGD by accumulating a velocity vector in directions of persistent gradient descent. This helps accelerate convergence in relevant directions and dampens oscillations. A momentum value of 0.9 is commonly used. Citation: [Polyak, 1964](https://doi.org/10.1016/0041-5553(64)90137-5).
2727

2828
### 3. RMSprop (Root Mean Square Propagation)
29-
An adaptive learning rate optimizer that divides the learning rate by an exponentially decaying average of squared gradients. This allows the optimizer to use larger steps for infrequent features and smaller steps for frequent ones, making it well-suited for non-stationary objectives.
29+
An adaptive learning rate optimizer that divides the learning rate by an exponentially decaying average of squared gradients. This allows the optimizer to use larger steps for infrequent features and smaller steps for frequent ones, making it well-suited for non-stationary objectives. Citation: [Hinton, 2012](https://www.cs.toronto.edu/~hinton/coursera/lecture6/lec6.pdf).
3030

3131
### 4. Adam (Adaptive Moment Estimation)
32-
Combines the best of momentum and RMSprop. It computes adaptive learning rates for each parameter using estimates of both first-order moments (mean) and second-order moments (variance) of the gradients. Adam is often the default choice due to its robustness across different problems.
32+
Combines the best of momentum and RMSprop. It computes adaptive learning rates for each parameter using estimates of both first-order moments (mean) and second-order moments (variance) of the gradients. Adam is often the default choice due to its robustness across different problems. Citation: [Kingma and Ba, 2014](https://arxiv.org/abs/1412.6980).
3333

3434
## Optimization techniques comparison
3535

36-
| Optimizer | Momentum | Adaptive learning rate | Notes |
37-
|---------------|:--------:|:----------------------:|--------------------------------------------|
38-
| SGD ||| Vanilla gradient descent |
39-
| SGD + Momentum||| Uses velocity accumulation (e.g., 0.9) |
40-
| RMSprop ||| Per-parameter learning rate scaling |
41-
| Adam ||| Combines momentum + adaptive rates |
36+
| Optimizer | Year introduced | Momentum | Adaptive learning rate | Notes |
37+
|----------------|:---------------:|:--------:|:----------------------:|--------------------------------------------|
38+
| SGD | 1958 ||| Vanilla gradient descent |
39+
| SGD + Momentum | 1964 ||| Uses velocity accumulation (e.g., 0.9) |
40+
| RMSprop | 2012 ||| Per-parameter learning rate scaling |
41+
| Adam | 2014 ||| Combines momentum + adaptive rates |
4242

4343
## Key takeaways from the demo
4444

0 commit comments

Comments
 (0)