You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
setting as follow:
disable HE (HE + zero2 occur error)
pp_epochs=1
num_train_epochs=1
disable_actor_dropout
per_device_train_batch_size and per_device_mini_train_batch_size are 2
actor loss
inference demo:
while actor_ema seems normal but it's effect is same with sft model