Log chosen/rejected entropy #1159

jacklanchantin · 2025-05-01T21:24:47Z

What does this PR do? Please describe:

Adds logging entropy for chosen and rejected sequences separately in online DPO training.
Few other small changes

Check list:

Was the content of this PR discussed and approved via a GitHub issue? (no need for typos or documentation improvements)
Did you read the contributor guideline?
Did you make sure that your PR does only one thing instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests?
Did you verify new and existing tests pass locally with your changes?
Did you update the CHANGELOG? (no need for typos, documentation, or minor internal changes)

jacklanchantin · 2025-05-01T22:57:47Z

src/fairseq2/recipes/lm/_online_finetune/_remote_vllm.py

@@ -221,7 +221,7 @@ def rollout_from_model(self, prompt_list, sampling_params=None):

        return outputs

-    def reward_from_model(self, prompt_list, batch_size=64):
+    def reward_from_model(self, prompt_list, batch_size=16):


was getting some vllm cuda OOM with batch_size=64 because we have custom vllm model

jacklanchantin · 2025-05-01T22:59:41Z

src/fairseq2/setup/_metrics.py

+    register("chosen_logit_entropy",  "Chosen Logit Entropy",                51, format_as_float)
+    register("rejected_logit_entropy","Rejected Logit Entropy",              51, format_as_float)


these are the only two added. rest are formatted

jacklanchantin · 2025-05-01T22:59:59Z

src/fairseq2/recipes/lm/_online_finetune/_grpo.py

-            per_seq_loss = (
-                (per_token_loss * target_mask).sum(dim=-1)
-            ).mean(dim=1)
+            per_seq_loss = ((per_token_loss * target_mask).sum(dim=-1)).mean(dim=1)


jacklanchantin · 2025-05-01T23:01:42Z

src/fairseq2/recipes/lm/_online_finetune/_online_dpo.py

+        )  # [Batch x Rollouts, 1]
+
+        # entropy for all N rollouts
+        logit_entropy = self.get_all_rollouts_entropy(rollouts)


not sure if we want this. previously logit_entropy was computed for the chosen sequences. here i'm computing it for all rollouts

jacklanchantin added 3 commits May 1, 2025 21:12

cleanup

cd1d45b

formatting

9132606

add chosen/rejected entropy

faddb3a

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 1, 2025

jacklanchantin changed the base branch from main to online_training May 1, 2025 21:25

jacklanchantin added 2 commits May 1, 2025 22:41

breakpoint

d67d7d0

add fn

2ac2b07

jacklanchantin commented May 1, 2025

View reviewed changes

jacklanchantin added 2 commits May 1, 2025 23:05

batch entropy

cb9c4dc

add vars to class

62119f2

jacklanchantin requested a review from uralik May 2, 2025 00:08

jacklanchantin marked this pull request as ready for review May 2, 2025 00:08

jacklanchantin requested a review from cbalioglu as a code owner May 2, 2025 00:08

jacklanchantin changed the title ~~Jacklanchantin/log metrics~~ Log chosne/rejected entropy May 2, 2025

jacklanchantin changed the title ~~Log chosne/rejected entropy~~ Log chosen/rejected entropy May 2, 2025

revert entropy calc

382ca93

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log chosen/rejected entropy #1159

Log chosen/rejected entropy #1159

jacklanchantin commented May 1, 2025 •

edited

Loading

jacklanchantin May 1, 2025 •

edited

Loading

jacklanchantin May 1, 2025

jacklanchantin May 1, 2025

jacklanchantin May 1, 2025

		register("chosen_logit_entropy", "Chosen Logit Entropy", 51, format_as_float)
		register("rejected_logit_entropy","Rejected Logit Entropy", 51, format_as_float)

Log chosen/rejected entropy #1159

Are you sure you want to change the base?

Log chosen/rejected entropy #1159

Conversation

jacklanchantin commented May 1, 2025 • edited Loading

jacklanchantin May 1, 2025 • edited Loading

Choose a reason for hiding this comment

jacklanchantin May 1, 2025

Choose a reason for hiding this comment

jacklanchantin May 1, 2025

Choose a reason for hiding this comment

jacklanchantin May 1, 2025

Choose a reason for hiding this comment

jacklanchantin commented May 1, 2025 •

edited

Loading

jacklanchantin May 1, 2025 •

edited

Loading