Skip to content

Conversation

@pluesclues
Copy link
Collaborator

Because now that AutoSequenceForClassification or reward modeling support was added and I verified that the run works correctly, the PR is much more simplified to get Online DPO integrated in than the previous one.

else:
loss = None
with self.compute_loss_context_manager():
tokenized_output = self.processing_class(inputs["prompt"], padding=True, truncation=True, return_tensors="pt").to(model.device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokenized_output? Should it be tokenized_input?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct, I just corrected it to inputs

@danielhanchen
Copy link
Contributor

@pluesclues Whenever its done, ping me!

Copy link
Collaborator

@Datta0 Datta0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

padding_mask = None
elif self.training:
# elif attention_mask is None:
elif self.training and os.environ.get("UNSLOTH_KEEP_PADDING", "0") != '1':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re "UNSLOTH_KEEP_PADDING" where is this set exactly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be set at the beginning of the script when loading so the attention calculations account for right padded tokens. This is something the user would set if using LLM reward models for Online DPO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to make it automatic?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a way to make this automatic by looking to see if any samples in the batch have right padded tokens and if they do, the flag is immediately enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants