Open
Description
Dear all,
We are trying to reproduce the results, however, as we follow the training steps, our chatbot is keep repeating a nonsense. We suspect that our RLHF part is bad, so we simply load the pretrained model, and the result is also very very bad. Anyone has the same issue? If you successfully trained a decent chatbot, do you have any bitter lesson that could share across the community?
Thanks!
Kind regards,
Jade