Skip to content

[DPO] Add hinge, bco_pair, robust, exo_pair, discopop loss types#1204

Open
kashif wants to merge 1 commit intolinkedin:mainfrom
kashif:dpo-additional-loss-types
Open

[DPO] Add hinge, bco_pair, robust, exo_pair, discopop loss types#1204
kashif wants to merge 1 commit intolinkedin:mainfrom
kashif:dpo-additional-loss-types

Conversation

@kashif
Copy link
Copy Markdown
Contributor

@kashif kashif commented Apr 27, 2026

Brings Liger's fused linear DPO loss closer to TRL's DPOTrainer by adding five additional loss variants on top of the existing sigmoid / apo_zero / apo_down / sppo_hard / nca_pair set:

  • hinge: relu(1 - beta * (chosen - rejected))
  • bco_pair: -logsigmoid(beta * chosen) - logsigmoid(-beta * rejected)
  • robust (cDPO): label-smoothed sigmoid loss with flipped-pair correction
  • exo_pair: KL(p_fθ || [1-eps, eps]) for K=2 (EXO-pref, paper 2402.00856)
  • discopop: blended logistic / exponential loss (DiscoPOP, paper 2406.08414)

Also threads two new kwargs through LigerFusedLinearDPOLoss / LigerFusedLinearDPOFunction:

  • label_smoothing (used by robust and exo_pair; validated at construction)
  • discopop_tau (temperature for the DiscoPOP modulation term)

Tests: adds HF reference implementations for each new loss type and a parametrized test_correctness_extra_loss_types covering all five variants across bf16/fp32, with/without bias, with/without ref bias, and compute_nll_loss on/off (320 cases). Adds test_label_smoothing_validation for the label_smoothing range checks. The existing test_invalid_loss_type is updated to cover the new supported set.

All 800 DPO test cases pass on H100.

Summary

Testing Done

  • Hardware Type:
  • run make test to ensure correctness
  • run make checkstyle to ensure code style
  • run make test-convergence to ensure convergence

Brings Liger's fused linear DPO loss closer to TRL's DPOTrainer by adding
five additional loss variants on top of the existing sigmoid / apo_zero /
apo_down / sppo_hard / nca_pair set:

- hinge: relu(1 - beta * (chosen - rejected))
- bco_pair: -logsigmoid(beta * chosen) - logsigmoid(-beta * rejected)
- robust (cDPO): label-smoothed sigmoid loss with flipped-pair correction
- exo_pair: KL(p_fθ || [1-eps, eps]) for K=2 (EXO-pref, paper 2402.00856)
- discopop: blended logistic / exponential loss (DiscoPOP, paper 2406.08414)

Also threads two new kwargs through LigerFusedLinearDPOLoss /
LigerFusedLinearDPOFunction:

- label_smoothing (used by robust and exo_pair; validated at construction)
- discopop_tau (temperature for the DiscoPOP modulation term)

Tests: adds HF reference implementations for each new loss type and a
parametrized test_correctness_extra_loss_types covering all five variants
across bf16/fp32, with/without bias, with/without ref bias, and
compute_nll_loss on/off (320 cases). Adds test_label_smoothing_validation
for the label_smoothing range checks. The existing test_invalid_loss_type
is updated to cover the new supported set.

All 800 DPO test cases pass on H100.
@kashif kashif force-pushed the dpo-additional-loss-types branch from 0420345 to 9428a54 Compare April 27, 2026 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant