Skip to content

Commit 398dacd

Browse files
Merge pull request #213 from gperdrizet/dev
Added LFS .zip file
2 parents 0405707 + 6e5bcb4 commit 398dacd

File tree

3 files changed

+20
-11
lines changed

3 files changed

+20
-11
lines changed

data/datasets.yml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -151,4 +151,10 @@ units:
151151
title: "Incremental capstone 9"
152152
datasets:
153153
- name: "Banking customer data"
154-
file: "Churn_Modeling.csv"
154+
file: "Churn_Modeling.csv"
155+
156+
- number: "INC10"
157+
title: "Incremental capstone 10"
158+
datasets:
159+
- name: "Face mask detection dataset"
160+
file: "Face_mask_detection.zip"

data/unit4/Face_mask_detection.zip

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:be89cb0e645099635acb79fe459aa5f2d359922f4492960b67b43ca08d65fa89
3+
size 174252187

site/resource_pages/optimizer_summary.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -20,25 +20,25 @@ This document summarizes the four optimizers compared in the Lesson 30 demo.
2020
## Optimizers overview
2121

2222
### 1. SGD (Stochastic Gradient Descent)
23-
Vanilla gradient descent that updates parameters based on the gradient of the loss function. When `batch_size=1`, it's true stochastic gradient descent; with larger batches, it becomes mini-batch gradient descent. Simple but can be slow to converge and sensitive to learning rate choice.
23+
Vanilla gradient descent that updates parameters based on the gradient of the loss function. When `batch_size=1`, it's true stochastic gradient descent; with larger batches, it becomes mini-batch gradient descent. Simple but can be slow to converge and sensitive to learning rate choice. Citation: [Robbins and Monro, 1951](https://projecteuclid.org/euclid.aoms/1177729586).
2424

2525
### 2. SGD + Momentum
26-
Extends vanilla SGD by accumulating a velocity vector in directions of persistent gradient descent. This helps accelerate convergence in relevant directions and dampens oscillations. A momentum value of 0.9 is commonly used.
26+
Extends vanilla SGD by accumulating a velocity vector in directions of persistent gradient descent. This helps accelerate convergence in relevant directions and dampens oscillations. A momentum value of 0.9 is commonly used. Citation: [Polyak, 1964](https://doi.org/10.1016/0041-5553(64)90137-5).
2727

2828
### 3. RMSprop (Root Mean Square Propagation)
29-
An adaptive learning rate optimizer that divides the learning rate by an exponentially decaying average of squared gradients. This allows the optimizer to use larger steps for infrequent features and smaller steps for frequent ones, making it well-suited for non-stationary objectives.
29+
An adaptive learning rate optimizer that divides the learning rate by an exponentially decaying average of squared gradients. This allows the optimizer to use larger steps for infrequent features and smaller steps for frequent ones, making it well-suited for non-stationary objectives. Citation: [Hinton, 2012](https://www.cs.toronto.edu/~hinton/coursera/lecture6/lec6.pdf).
3030

3131
### 4. Adam (Adaptive Moment Estimation)
32-
Combines the best of momentum and RMSprop. It computes adaptive learning rates for each parameter using estimates of both first-order moments (mean) and second-order moments (variance) of the gradients. Adam is often the default choice due to its robustness across different problems.
32+
Combines the best of momentum and RMSprop. It computes adaptive learning rates for each parameter using estimates of both first-order moments (mean) and second-order moments (variance) of the gradients. Adam is often the default choice due to its robustness across different problems. Citation: [Kingma and Ba, 2014](https://arxiv.org/abs/1412.6980).
3333

3434
## Optimization techniques comparison
3535

36-
| Optimizer | Momentum | Adaptive learning rate | Notes |
37-
|---------------|:--------:|:----------------------:|--------------------------------------------|
38-
| SGD ||| Vanilla gradient descent |
39-
| SGD + Momentum||| Uses velocity accumulation (e.g., 0.9) |
40-
| RMSprop ||| Per-parameter learning rate scaling |
41-
| Adam ||| Combines momentum + adaptive rates |
36+
| Optimizer | Year introduced | Momentum | Adaptive learning rate | Notes |
37+
|----------------|:---------------:|:--------:|:----------------------:|--------------------------------------------|
38+
| SGD | 1958 ||| Vanilla gradient descent |
39+
| SGD + Momentum | 1964 ||| Uses velocity accumulation (e.g., 0.9) |
40+
| RMSprop | 2012 ||| Per-parameter learning rate scaling |
41+
| Adam | 2014 ||| Combines momentum + adaptive rates |
4242

4343
## Key takeaways from the demo
4444

0 commit comments

Comments
 (0)