Open
Description
`epoch: 0|step: 259|ppo_ep: 1|act_loss: 0.0253753662109375|cri_loss: 0.2144775390625|unsuper_loss: 0.0
average reward score: 0.20556640625
epoch: 0|step: 260|ppo_ep: 1|act_loss: 0.1915283203125|cri_loss: 0.326171875|unsuper_loss: 0.0
average reward score: 0.205810546875
epoch: 0|step: 261|ppo_ep: 1|act_loss: -0.1837158203125|cri_loss: 0.2259521484375|unsuper_loss: 0.0
average reward score: 0.2064208984375
epoch: 0|step: 262|ppo_ep: 1|act_loss: -0.099609375|cri_loss: 0.1646728515625|unsuper_loss: 0.0
average reward score: 0.2059326171875
epoch: 0|step: 263|ppo_ep: 1|act_loss: -0.07781982421875|cri_loss: 0.28271484375|unsuper_loss: 0.0
average reward score: 0.20654296875
epoch: 0|step: 264|ppo_ep: 1|act_loss: 0.10009765625|cri_loss: 0.303955078125|unsuper_loss: 0.0
average reward score: 0.2060546875
epoch: 0|step: 265|ppo_ep: 1|act_loss: 0.10357666015625|cri_loss: 0.332275390625|unsuper_loss: 0.0
average reward score: 0.2078857421875
epoch: 0|step: 266|ppo_ep: 1|act_loss: -0.062744140625|cri_loss: 0.23828125|unsuper_loss: 0.0
average reward score: 0.2061767578125
epoch: 0|step: 267|ppo_ep: 1|act_loss: 0.1456298828125|cri_loss: 0.33837890625|unsuper_loss: 0.0
average reward score: 0.2064208984375
epoch: 0|step: 268|ppo_ep: 1|act_loss: 0.0635986328125|cri_loss: 0.20068359375|unsuper_loss: 0.0
average reward score: 0.207275390625
[2023-06-09 00:06:07,687] [INFO] [logging.py:96:log_dist] [Rank 0] step=270, skipped=10, lr=[1.1237076437413556e-05, 1.1237076437413556e-05], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-09 00:06:07,820] [INFO] [timer.py:208:stop] epoch=0/micro_step=270/global_step=270, RunningAvgSamplesPerSec=2.183121824503433, CurrSamplesPerSec=11.893856302438598, MemAllocated=49.03GB, MaxMemAllocated=57.62GB
[2023-06-09 00:06:08,154] [INFO] [logging.py:96:log_dist] [Rank 0] step=270, skipped=3, lr=[4.6543648237896e-06, 4.6543648237896e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 269|ppo_ep: 1|act_loss: -0.03240966796875|cri_loss: 0.1427001953125|unsuper_loss: 0.0
average reward score: 0.205078125
epoch: 0|step: 270|ppo_ep: 1|act_loss: 0.342041015625|cri_loss: 0.377685546875|unsuper_loss: 0.0
average reward score: 0.2064208984375
epoch: 0|step: 271|ppo_ep: 1|act_loss: 0.138427734375|cri_loss: 0.2430419921875|unsuper_loss: 0.0
average reward score: 0.205322265625
epoch: 0|step: 272|ppo_ep: 1|act_loss: 0.1181640625|cri_loss: 0.21337890625|unsuper_loss: 0.0
average reward score: 0.20703125
epoch: 0|step: 273|ppo_ep: 1|act_loss: 0.06524658203125|cri_loss: 0.1839599609375|unsuper_loss: 0.0
average reward score: 0.206298828125
epoch: 0|step: 274|ppo_ep: 1|act_loss: 0.07135009765625|cri_loss: 0.1356201171875|unsuper_loss: 0.0
average reward score: 0.2081298828125
epoch: 0|step: 275|ppo_ep: 1|act_loss: 0.066650390625|cri_loss: 0.2161865234375|unsuper_loss: 0.0
average reward score: 0.2071533203125
epoch: 0|step: 276|ppo_ep: 1|act_loss: 0.05303955078125|cri_loss: 0.2177734375|unsuper_loss: 0.0
average reward score: 0.2059326171875
epoch: 0|step: 277|ppo_ep: 1|act_loss: 0.015899658203125|cri_loss: 0.1387939453125|unsuper_loss: 0.0
average reward score: 0.2060546875
epoch: 0|step: 278|ppo_ep: 1|act_loss: -0.0144195556640625|cri_loss: 0.26025390625|unsuper_loss: 0.0
average reward score: 0.20556640625
[2023-06-09 00:20:13,519] [INFO] [logging.py:96:log_dist] [Rank 0] step=280, skipped=10, lr=[1.1141143057005536e-05, 1.1141143057005536e-05], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-09 00:20:13,652] [INFO] [timer.py:208:stop] epoch=0/micro_step=280/global_step=280, RunningAvgSamplesPerSec=2.2499286746168656, CurrSamplesPerSec=13.297137840544718, MemAllocated=49.03GB, MaxMemAllocated=57.62GB
[2023-06-09 00:20:13,986] [INFO] [logging.py:96:log_dist] [Rank 0] step=280, skipped=3, lr=[4.612866045608177e-06, 4.612866045608177e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 279|ppo_ep: 1|act_loss: -0.038238525390625|cri_loss: 0.2322998046875|unsuper_loss: 0.0
average reward score: 0.2037353515625
epoch: 0|step: 280|ppo_ep: 1|act_loss: -0.03887939453125|cri_loss: 0.264404296875|unsuper_loss: 0.0
average reward score: 0.2056884765625
epoch: 0|step: 281|ppo_ep: 1|act_loss: -0.0809326171875|cri_loss: 0.325927734375|unsuper_loss: 0.0
average reward score: 0.205078125
epoch: 0|step: 282|ppo_ep: 1|act_loss: -0.0087890625|cri_loss: 0.281982421875|unsuper_loss: 0.0
average reward score: 0.205322265625
epoch: 0|step: 283|ppo_ep: 1|act_loss: -0.1871337890625|cri_loss: 0.302734375|unsuper_loss: 0.0
average reward score: 0.205078125
epoch: 0|step: 284|ppo_ep: 1|act_loss: -0.126220703125|cri_loss: 0.2880859375|unsuper_loss: 0.0
average reward score: 0.2052001953125
epoch: 0|step: 285|ppo_ep: 1|act_loss: -0.07843017578125|cri_loss: 0.2890625|unsuper_loss: 0.0
average reward score: 0.207275390625
epoch: 0|step: 286|ppo_ep: 1|act_loss: -0.0885009765625|cri_loss: 0.240478515625|unsuper_loss: 0.0
average reward score: 0.2061767578125
epoch: 0|step: 287|ppo_ep: 1|act_loss: -0.035888671875|cri_loss: 0.24755859375|unsuper_loss: 0.0
average reward score: 0.2069091796875
epoch: 0|step: 288|ppo_ep: 1|act_loss: 0.01471710205078125|cri_loss: 0.2418212890625|unsuper_loss: 0.0
average reward score: 0.20458984375
[2023-06-09 00:34:19,262] [INFO] [logging.py:96:log_dist] [Rank 0] step=290, skipped=10, l`
The loss was oscillating at the beginning, but collapsed after about 200 steps.
I tested the model during the period of loss oscillation and after the loss collapse respectively, and found that the performance of both models was far worse than the original model, and they could not produce normal outputs.
During the training of Step 3, the reward score of my language model collapsed to a stable point and the output of the model became completely chaotic. Has anyone encountered this phenomenon?