You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This discussion was converted from issue #6994 on February 19, 2025 10:13.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Reminder
System Info
我所理解的dpo的数据格式如下,只有chose和reject:

有以下几个问题:
1.ppo中所提到的得分是在哪部分体现呢?
2.训练奖励模型的数据是只需要chose和reject吗?
3.这个数据集的rating是得分的意思吗,如果要转换成llamafactory支持的格式需要进行怎样的处理。
https://huggingface.co/datasets/MMInstruction/VLFeedback?row=0
Reproduction
萌新一头雾水
Others
No response
Beta Was this translation helpful? Give feedback.
All reactions