ReasonFlux-PRM:A question about coherence score and N negative samples

Hi, thank you very much for your excellent work.
I have read your paper, which has greatly inspired me. At present, I am very interested in some of the implementation details, and I would like to ask for your clarification:

You proposed the step-by-step coherence score, and I found it very interesting that you used N negative samples as the reference. I have a few questions:

1. Could you let me know approximately what order of magnitude N is?

2. Are the negative samples also cot-steps?

3. I tried to reproduce this result in code tasks and generated about 1,500 cot-step negative samples for solving code problems. However, I noticed that after averaging across steps-coherence score, the differences are not very large. I would like to know whether the impact of coherence scores on individual steps is relatively small.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ReasonFlux-PRM:A question about coherence score and N negative samples #21

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ReasonFlux-PRM:A question about coherence score and N negative samples #21

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions