Some question regarding using partial answer as training objective

Hello, thank you for your great work! And I have some question about the step-dpo, the dataset on hf ("xinlai/Math-Step-DPO-10K") seems like taking "prompt" as input, and use "chosen" and "rejected" during training ("full_chosen" and "full_rejected" is assumingly not used. ), then under this circumstanding, won't the model tend to generate the partial response during inference? I am not sure if I am understanding right here, feel free to correct me, thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some question regarding using partial answer as training objective #25

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Some question regarding using partial answer as training objective #25

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions