Commit 90bc0c9
authored
Update hyperparameters to reflect InstructGPT (#1966)
I noticed that there are some non-standard hyperparameter values
(namely: Adam betas and weight decay), so I suggest considering
experimenting with the values proposed by InstructGPT [1], which are
quite standard for LLM training AFAIK:
> All models are trained with the Adam optimizer, with β1 = 0.9 and β2 =
0.95.
> We train our SFT models for 16 epochs with residual dropout of 0.2. We
use a cosine LR schedule
down to 10% of the original learning rate, with no learning rate warmup.
For our 1.3B and 6B
models, we use an LR of 9.65e-6 and a batch size of 32. For 175B, we use
a LR of 5.03e-6 and
a batch size of 8.
Unfortunately, I don't have easy access to compute power, so I will have
to let someone else validate whether or not these changes have any
(desired) effect.
[1] https://arxiv.org/abs/2203.021552 files changed
Lines changed: 13 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
7 | 10 | | |
8 | 11 | | |
9 | 12 | | |
| |||
64 | 67 | | |
65 | 68 | | |
66 | 69 | | |
67 | | - | |
| 70 | + | |
68 | 71 | | |
69 | 72 | | |
70 | 73 | | |
| |||
76 | 79 | | |
77 | 80 | | |
78 | 81 | | |
79 | | - | |
| 82 | + | |
80 | 83 | | |
81 | 84 | | |
82 | 85 | | |
| |||
87 | 90 | | |
88 | 91 | | |
89 | 92 | | |
90 | | - | |
| 93 | + | |
91 | 94 | | |
92 | 95 | | |
93 | 96 | | |
94 | 97 | | |
95 | 98 | | |
96 | 99 | | |
97 | 100 | | |
98 | | - | |
| 101 | + | |
99 | 102 | | |
100 | | - | |
| 103 | + | |
101 | 104 | | |
102 | 105 | | |
103 | 106 | | |
104 | | - | |
| 107 | + | |
105 | 108 | | |
106 | 109 | | |
107 | 110 | | |
108 | 111 | | |
109 | 112 | | |
110 | 113 | | |
111 | | - | |
| 114 | + | |
112 | 115 | | |
113 | 116 | | |
114 | 117 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
252 | 252 | | |
253 | 253 | | |
254 | 254 | | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
255 | 258 | | |
256 | 259 | | |
257 | 260 | | |
| |||
0 commit comments