Skip to content

Commit 3cebe25

Browse files
committed
feat: add instructions to reproduce synthesis of DART-Math-Prop2Diff
1 parent 0f3e45d commit 3cebe25

File tree

2 files changed

+36
-2
lines changed

2 files changed

+36
-2
lines changed

README.md

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -402,7 +402,7 @@ baseline** in the paper, just set
402402

403403
<summary>
404404

405-
The off-the-shelf command to reproduce the data synthesis of the Vanilla
405+
The off-the-shelf command to reproduce the synthesis of the Vanilla
406406
Rejection Tuning (VRT) baseline in the paper
407407
</summary>
408408

@@ -419,6 +419,31 @@ CUDA_VISIBLE_DEVICES="0" python pipeline/gen.py \
419419

420420
</details>
421421

422+
<details>
423+
424+
<summary>
425+
426+
So sorry that it still need some manual efforts to reproduce the data
427+
synthesis of `DART-Math-Prop2Diff`. For now, please follow the
428+
instructions in the paper
429+
</summary>
430+
431+
1. Calculate “fail rate” (`1-pass_rate`) for each query in MATH and
432+
GSM8K training sets (see the `pass_rate` field of query information
433+
in
434+
[MATH](https://huggingface.co/datasets/hkust-nlp/dart-math-pool-math-query-info)
435+
and
436+
[GSM8K](https://huggingface.co/datasets/hkust-nlp/dart-math-pool-gsm8k-query-info)).
437+
2. Calculate the target number of correct responses for each query in
438+
the final training set. Note that we try to ensure at least one
439+
correct response for each query in the `DART-Math` datasets, which
440+
you could implement by rounding **up** when calculating the response
441+
number for each query.
442+
3. Sample responses for each query until the target number of correct
443+
ones is met (thus proportional to its “fail rate”).
444+
445+
</details>
446+
422447
After the synthesis, you can use the [curation
423448
script](pipeline/curate.py) to curate the final dataset.
424449

nbs/index.ipynb

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -404,7 +404,7 @@
404404
"To reproduce the data synthesis of the **Vanilla Rejection Tuning (VRT) baseline** in the paper, just set `--max_n_trials 52 --min_n_corrects 0`.\n",
405405
"\n",
406406
"<details>\n",
407-
"<summary>The off-the-shelf command to reproduce the data synthesis of the Vanilla Rejection Tuning (VRT) baseline in the paper</summary>\n",
407+
"<summary>The off-the-shelf command to reproduce the synthesis of the Vanilla Rejection Tuning (VRT) baseline in the paper</summary>\n",
408408
"```shell\n",
409409
"CUDA_VISIBLE_DEVICES=\"0\" python pipeline/gen.py \\\n",
410410
" --gen_save_path \"data/res/dart-math-uniform.jsonl\" \\\n",
@@ -418,6 +418,15 @@
418418
"\n",
419419
"</details>\n",
420420
"\n",
421+
"<details>\n",
422+
"<summary>So sorry that it still need some manual efforts to reproduce the data synthesis of `DART-Math-Prop2Diff`. For now, please follow the instructions in the paper</summary>\n",
423+
"\n",
424+
"1. Calculate \"fail rate\" (`1-pass_rate`) for each query in MATH and GSM8K training sets (see the `pass_rate` field of query information in [MATH](https://huggingface.co/datasets/hkust-nlp/dart-math-pool-math-query-info) and [GSM8K](https://huggingface.co/datasets/hkust-nlp/dart-math-pool-gsm8k-query-info)).\n",
425+
"2. Calculate the target number of correct responses for each query in the final training set. Note that we try to ensure at least one correct response for each query in the `DART-Math` datasets, which you could implement by rounding **up** when calculating the response number for each query.\n",
426+
"3. Sample responses for each query until the target number of correct ones is met (thus proportional to its \"fail rate\").\n",
427+
"\n",
428+
"</details>\n",
429+
"\n",
421430
"After the synthesis, you can use the [curation script](pipeline/curate.py) to curate the final dataset.\n"
422431
]
423432
},

0 commit comments

Comments
 (0)