Skip to content

ReasonFlux-PRM:A question about coherence score and N negative samples #21

@Semonlkf

Description

@Semonlkf

Hi, thank you very much for your excellent work.
I have read your paper, which has greatly inspired me. At present, I am very interested in some of the implementation details, and I would like to ask for your clarification:

You proposed the step-by-step coherence score, and I found it very interesting that you used N negative samples as the reference. I have a few questions:

  1. Could you let me know approximately what order of magnitude N is?

  2. Are the negative samples also cot-steps?

  3. I tried to reproduce this result in code tasks and generated about 1,500 cot-step negative samples for solving code problems. However, I noticed that after averaging across steps-coherence score, the differences are not very large. I would like to know whether the impact of coherence scores on individual steps is relatively small.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions