[DPO] Add hinge, bco_pair, robust, exo_pair, discopop loss types by kashif · Pull Request #1204 · linkedin/Liger-Kernel

kashif · 2026-04-27T13:20:32Z

Brings Liger's fused linear DPO loss closer to TRL's DPOTrainer by adding five additional loss variants on top of the existing sigmoid / apo_zero / apo_down / sppo_hard / nca_pair set:

hinge: relu(1 - beta * (chosen - rejected))
bco_pair: -logsigmoid(beta * chosen) - logsigmoid(-beta * rejected)
robust (cDPO): label-smoothed sigmoid loss with flipped-pair correction
exo_pair: KL(p_fθ || [1-eps, eps]) for K=2 (EXO-pref, paper 2402.00856)
discopop: blended logistic / exponential loss (DiscoPOP, paper 2406.08414)

Also threads two new kwargs through LigerFusedLinearDPOLoss / LigerFusedLinearDPOFunction:

label_smoothing (used by robust and exo_pair; validated at construction)
discopop_tau (temperature for the DiscoPOP modulation term)

Tests: adds HF reference implementations for each new loss type and a parametrized test_correctness_extra_loss_types covering all five variants across bf16/fp32, with/without bias, with/without ref bias, and compute_nll_loss on/off (320 cases). Adds test_label_smoothing_validation for the label_smoothing range checks. The existing test_invalid_loss_type is updated to cover the new supported set.

All 800 DPO test cases pass on H100.

Summary

Testing Done

Hardware Type:
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

Brings Liger's fused linear DPO loss closer to TRL's DPOTrainer by adding five additional loss variants on top of the existing sigmoid / apo_zero / apo_down / sppo_hard / nca_pair set: - hinge: relu(1 - beta * (chosen - rejected)) - bco_pair: -logsigmoid(beta * chosen) - logsigmoid(-beta * rejected) - robust (cDPO): label-smoothed sigmoid loss with flipped-pair correction - exo_pair: KL(p_fθ || [1-eps, eps]) for K=2 (EXO-pref, paper 2402.00856) - discopop: blended logistic / exponential loss (DiscoPOP, paper 2406.08414) Also threads two new kwargs through LigerFusedLinearDPOLoss / LigerFusedLinearDPOFunction: - label_smoothing (used by robust and exo_pair; validated at construction) - discopop_tau (temperature for the DiscoPOP modulation term) Tests: adds HF reference implementations for each new loss type and a parametrized test_correctness_extra_loss_types covering all five variants across bf16/fp32, with/without bias, with/without ref bias, and compute_nll_loss on/off (320 cases). Adds test_label_smoothing_validation for the label_smoothing range checks. The existing test_invalid_loss_type is updated to cover the new supported set. All 800 DPO test cases pass on H100.

kashif force-pushed the dpo-additional-loss-types branch from 0420345 to 9428a54 Compare April 27, 2026 13:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DPO] Add hinge, bco_pair, robust, exo_pair, discopop loss types#1204

[DPO] Add hinge, bco_pair, robust, exo_pair, discopop loss types#1204
kashif wants to merge 1 commit intolinkedin:mainfrom
kashif:dpo-additional-loss-types

kashif commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kashif commented Apr 27, 2026

Summary

Testing Done

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant