fix: correct margin calculation in DPO training #172

leejianwoo-collab · 2025-12-14T02:54:22Z

fix: correct margin calculation in DPO training

Remove redundant dpo_beta multiplication in margin computation.
The chosen_rewards and rejected_rewards already include the dpo_beta
factor, so margin should not multiply by dpo_beta again.

This fix only affects logging metrics and does not impact training.

Fixes #21

fix: correct margin calculation in DPO training

cbfb9d0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: correct margin calculation in DPO training #172

fix: correct margin calculation in DPO training #172

Uh oh!

leejianwoo-collab commented Dec 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: correct margin calculation in DPO training #172

Are you sure you want to change the base?

fix: correct margin calculation in DPO training #172

Uh oh!

Conversation

leejianwoo-collab commented Dec 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant