Skip to content

During the training of Step 3, the reward score of my language model collapsed to a stable point #586

Open
@scarydemon2

Description

@scarydemon2

`epoch: 0|step: 259|ppo_ep: 1|act_loss: 0.0253753662109375|cri_loss: 0.2144775390625|unsuper_loss: 0.0
average reward score: 0.20556640625

epoch: 0|step: 260|ppo_ep: 1|act_loss: 0.1915283203125|cri_loss: 0.326171875|unsuper_loss: 0.0
average reward score: 0.205810546875

epoch: 0|step: 261|ppo_ep: 1|act_loss: -0.1837158203125|cri_loss: 0.2259521484375|unsuper_loss: 0.0
average reward score: 0.2064208984375

epoch: 0|step: 262|ppo_ep: 1|act_loss: -0.099609375|cri_loss: 0.1646728515625|unsuper_loss: 0.0
average reward score: 0.2059326171875

epoch: 0|step: 263|ppo_ep: 1|act_loss: -0.07781982421875|cri_loss: 0.28271484375|unsuper_loss: 0.0
average reward score: 0.20654296875

epoch: 0|step: 264|ppo_ep: 1|act_loss: 0.10009765625|cri_loss: 0.303955078125|unsuper_loss: 0.0
average reward score: 0.2060546875

epoch: 0|step: 265|ppo_ep: 1|act_loss: 0.10357666015625|cri_loss: 0.332275390625|unsuper_loss: 0.0
average reward score: 0.2078857421875

epoch: 0|step: 266|ppo_ep: 1|act_loss: -0.062744140625|cri_loss: 0.23828125|unsuper_loss: 0.0
average reward score: 0.2061767578125

epoch: 0|step: 267|ppo_ep: 1|act_loss: 0.1456298828125|cri_loss: 0.33837890625|unsuper_loss: 0.0
average reward score: 0.2064208984375

epoch: 0|step: 268|ppo_ep: 1|act_loss: 0.0635986328125|cri_loss: 0.20068359375|unsuper_loss: 0.0
average reward score: 0.207275390625

[2023-06-09 00:06:07,687] [INFO] [logging.py:96:log_dist] [Rank 0] step=270, skipped=10, lr=[1.1237076437413556e-05, 1.1237076437413556e-05], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-09 00:06:07,820] [INFO] [timer.py:208:stop] epoch=0/micro_step=270/global_step=270, RunningAvgSamplesPerSec=2.183121824503433, CurrSamplesPerSec=11.893856302438598, MemAllocated=49.03GB, MaxMemAllocated=57.62GB
[2023-06-09 00:06:08,154] [INFO] [logging.py:96:log_dist] [Rank 0] step=270, skipped=3, lr=[4.6543648237896e-06, 4.6543648237896e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 269|ppo_ep: 1|act_loss: -0.03240966796875|cri_loss: 0.1427001953125|unsuper_loss: 0.0
average reward score: 0.205078125

epoch: 0|step: 270|ppo_ep: 1|act_loss: 0.342041015625|cri_loss: 0.377685546875|unsuper_loss: 0.0
average reward score: 0.2064208984375

epoch: 0|step: 271|ppo_ep: 1|act_loss: 0.138427734375|cri_loss: 0.2430419921875|unsuper_loss: 0.0
average reward score: 0.205322265625

epoch: 0|step: 272|ppo_ep: 1|act_loss: 0.1181640625|cri_loss: 0.21337890625|unsuper_loss: 0.0
average reward score: 0.20703125

epoch: 0|step: 273|ppo_ep: 1|act_loss: 0.06524658203125|cri_loss: 0.1839599609375|unsuper_loss: 0.0
average reward score: 0.206298828125

epoch: 0|step: 274|ppo_ep: 1|act_loss: 0.07135009765625|cri_loss: 0.1356201171875|unsuper_loss: 0.0
average reward score: 0.2081298828125

epoch: 0|step: 275|ppo_ep: 1|act_loss: 0.066650390625|cri_loss: 0.2161865234375|unsuper_loss: 0.0
average reward score: 0.2071533203125

epoch: 0|step: 276|ppo_ep: 1|act_loss: 0.05303955078125|cri_loss: 0.2177734375|unsuper_loss: 0.0
average reward score: 0.2059326171875

epoch: 0|step: 277|ppo_ep: 1|act_loss: 0.015899658203125|cri_loss: 0.1387939453125|unsuper_loss: 0.0
average reward score: 0.2060546875

epoch: 0|step: 278|ppo_ep: 1|act_loss: -0.0144195556640625|cri_loss: 0.26025390625|unsuper_loss: 0.0
average reward score: 0.20556640625

[2023-06-09 00:20:13,519] [INFO] [logging.py:96:log_dist] [Rank 0] step=280, skipped=10, lr=[1.1141143057005536e-05, 1.1141143057005536e-05], mom=[(0.9, 0.95), (0.9, 0.95)]
[2023-06-09 00:20:13,652] [INFO] [timer.py:208:stop] epoch=0/micro_step=280/global_step=280, RunningAvgSamplesPerSec=2.2499286746168656, CurrSamplesPerSec=13.297137840544718, MemAllocated=49.03GB, MaxMemAllocated=57.62GB
[2023-06-09 00:20:13,986] [INFO] [logging.py:96:log_dist] [Rank 0] step=280, skipped=3, lr=[4.612866045608177e-06, 4.612866045608177e-06], mom=[(0.9, 0.95), (0.9, 0.95)]
epoch: 0|step: 279|ppo_ep: 1|act_loss: -0.038238525390625|cri_loss: 0.2322998046875|unsuper_loss: 0.0
average reward score: 0.2037353515625

epoch: 0|step: 280|ppo_ep: 1|act_loss: -0.03887939453125|cri_loss: 0.264404296875|unsuper_loss: 0.0
average reward score: 0.2056884765625

epoch: 0|step: 281|ppo_ep: 1|act_loss: -0.0809326171875|cri_loss: 0.325927734375|unsuper_loss: 0.0
average reward score: 0.205078125

epoch: 0|step: 282|ppo_ep: 1|act_loss: -0.0087890625|cri_loss: 0.281982421875|unsuper_loss: 0.0
average reward score: 0.205322265625

epoch: 0|step: 283|ppo_ep: 1|act_loss: -0.1871337890625|cri_loss: 0.302734375|unsuper_loss: 0.0
average reward score: 0.205078125

epoch: 0|step: 284|ppo_ep: 1|act_loss: -0.126220703125|cri_loss: 0.2880859375|unsuper_loss: 0.0
average reward score: 0.2052001953125

epoch: 0|step: 285|ppo_ep: 1|act_loss: -0.07843017578125|cri_loss: 0.2890625|unsuper_loss: 0.0
average reward score: 0.207275390625

epoch: 0|step: 286|ppo_ep: 1|act_loss: -0.0885009765625|cri_loss: 0.240478515625|unsuper_loss: 0.0
average reward score: 0.2061767578125

epoch: 0|step: 287|ppo_ep: 1|act_loss: -0.035888671875|cri_loss: 0.24755859375|unsuper_loss: 0.0
average reward score: 0.2069091796875

epoch: 0|step: 288|ppo_ep: 1|act_loss: 0.01471710205078125|cri_loss: 0.2418212890625|unsuper_loss: 0.0
average reward score: 0.20458984375

[2023-06-09 00:34:19,262] [INFO] [logging.py:96:log_dist] [Rank 0] step=290, skipped=10, l`

The loss was oscillating at the beginning, but collapsed after about 200 steps.

I tested the model during the period of loss oscillation and after the loss collapse respectively, and found that the performance of both models was far worse than the original model, and they could not produce normal outputs.

During the training of Step 3, the reward score of my language model collapsed to a stable point and the output of the model became completely chaotic. Has anyone encountered this phenomenon?

Metadata

Metadata

Assignees

No one assigned

    Labels

    deespeed chatDeepSpeed ChatmodelingRelated to modeling questions.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions