We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
1 parent 435403b commit cc80f47Copy full SHA for cc80f47
README.md
@@ -140,6 +140,18 @@ bash reward_generation/mt_score_generate.sh \
140
--loop 1
141
```
142
143
+Generate reasoning data
144
+
145
+```bash
146
+# example of math
147
+python rationale_generation/process.py \
148
+ --model_path "Qwen/QwQ-32B" \
149
+ --data_path _output/monte_carlo_processed/math_train_Qwen2.5-Math-7B-Instruct_monte_carlo \
150
+ --save_path _output/reasoning_output/math_train_QwQ_reasoning \
151
+ --num_gpu_per 1 \
152
+ --majority_of_N 1
153
+```
154
155
### Critique-refinement
156
157
Execute policy refinement based on GenPRM's split output
0 commit comments