Open
Description
When i try to reproduce bloom, i meet the same problem:
"The min_length setting force the model generate to max length, which produce repeated or nonsense result."
fix ppo_trainer generate and scores calculation in stage 2
So i try to delete the "min_length setting", but i find the program can't continue to run at https://github.com/microsoft/DeepSpeedExamples/blob/8f8099a813f3b223d5df39e0c15c748de4eb1669/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py#L105