Hi, thank you very much for your excellent work.
I have read your paper, which has greatly inspired me. At present, I am very interested in some of the implementation details, and I would like to ask for your clarification:
You proposed the step-by-step coherence score, and I found it very interesting that you used N negative samples as the reference. I have a few questions:
-
Could you let me know approximately what order of magnitude N is?
-
Are the negative samples also cot-steps?
-
I tried to reproduce this result in code tasks and generated about 1,500 cot-step negative samples for solving code problems. However, I noticed that after averaging across steps-coherence score, the differences are not very large. I would like to know whether the impact of coherence scores on individual steps is relatively small.