Skip to content

Commit 6ce7e4e

Browse files
committed
deploy: f83559f
1 parent 99d7064 commit 6ce7e4e

File tree

8 files changed

+8
-8
lines changed

8 files changed

+8
-8
lines changed
319 Bytes
Binary file not shown.

.doctrees/environment.pickle

0 Bytes
Binary file not shown.

_examples_synced/geo3k_vlm_multi_turn/README.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -456,8 +456,8 @@ <h2> Contents </h2>
456456

457457
<section class="tex2jax_ignore mathjax_ignore" id="vlm-multi-turn-geo3k-dataset">
458458
<h1>VLM Multi-Turn (geo3k dataset)<a class="headerlink" href="#vlm-multi-turn-geo3k-dataset" title="Link to this heading">#</a></h1>
459-
<p>Training VLM on <a class="reference external" href="https://huggingface.co/datasets/hiyouga/geometry3k">geo3k dataset</a> with multi-turn reasoning with interactive environment feedback, using GRPO. For dataset, we used the <a class="reference external" href="https://huggingface.co/datasets/VeraIsHere/geo3k_imgurl_processed">processed version</a>.</p>
460-
<p>Thanks to Slime’s clean design, multi-turn RL aligns with first principles: with a <a class="reference internal" href="#rollout.py#L309"><span class="xref myst">custom rollout function</span></a>, any training backend (e.g. FSDP/Megatron) can use it.</p>
459+
<p>Training VLM on <a class="reference external" href="https://huggingface.co/datasets/hiyouga/geometry3k">geo3k dataset</a> with multi-turn reasoning with interactive environment feedback, using GRPO. For the dataset, we used the <a class="reference external" href="https://huggingface.co/datasets/VeraIsHere/geo3k_imgurl_processed">processed version</a>.</p>
460+
<p><strong>Thanks to slime’s clean design, multi-turn RL aligns with first principles: with a <a class="reference internal" href="#rollout.py#L309"><span class="xref myst">custom rollout function</span></a>, any training backend (e.g. Megatron/FSDP) can use it.</strong></p>
461461
<p>The multi-turn rollout is implemented through a <a class="reference internal" href="#rollout.py#L309"><span class="xref myst">custom generate function</span></a>, overriding the original generate function.</p>
462462
<p>In terms of the environment interaction, this example initializes a <a class="reference internal" href="#env_geo3k.py"><span class="xref myst">custom interactive environment</span></a> with the APIs below.</p>
463463
<details>

_sources/_examples_synced/geo3k_vlm_multi_turn/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# VLM Multi-Turn (geo3k dataset)
2-
Training VLM on [geo3k dataset](https://huggingface.co/datasets/hiyouga/geometry3k) with multi-turn reasoning with interactive environment feedback, using GRPO. For dataset, we used the [processed version](https://huggingface.co/datasets/VeraIsHere/geo3k_imgurl_processed).
2+
Training VLM on [geo3k dataset](https://huggingface.co/datasets/hiyouga/geometry3k) with multi-turn reasoning with interactive environment feedback, using GRPO. For the dataset, we used the [processed version](https://huggingface.co/datasets/VeraIsHere/geo3k_imgurl_processed).
33

4-
Thanks to Slime's clean design, multi-turn RL aligns with first principles: with a [custom rollout function](rollout.py#L309), any training backend (e.g. FSDP/Megatron) can use it.
4+
**Thanks to slime's clean design, multi-turn RL aligns with first principles: with a [custom rollout function](rollout.py#L309), any training backend (e.g. Megatron/FSDP) can use it.**
55

66
The multi-turn rollout is implemented through a [custom generate function](rollout.py#L309), overriding the original generate function.
77

321 Bytes
Binary file not shown.

zh/.doctrees/environment.pickle

0 Bytes
Binary file not shown.

zh/_examples_synced/geo3k_vlm_multi_turn/README.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -452,8 +452,8 @@ <h2> 目录 </h2>
452452

453453
<section class="tex2jax_ignore mathjax_ignore" id="vlm-multi-turn-geo3k-dataset">
454454
<h1>VLM Multi-Turn (geo3k dataset)<a class="headerlink" href="#vlm-multi-turn-geo3k-dataset" title="Link to this heading">#</a></h1>
455-
<p>Training VLM on <a class="reference external" href="https://huggingface.co/datasets/hiyouga/geometry3k">geo3k dataset</a> with multi-turn reasoning with interactive environment feedback, using GRPO. For dataset, we used the <a class="reference external" href="https://huggingface.co/datasets/VeraIsHere/geo3k_imgurl_processed">processed version</a>.</p>
456-
<p>Thanks to Slime's clean design, multi-turn RL aligns with first principles: with a <a class="reference internal" href="#rollout.py#L309"><span class="xref myst">custom rollout function</span></a>, any training backend (e.g. FSDP/Megatron) can use it.</p>
455+
<p>Training VLM on <a class="reference external" href="https://huggingface.co/datasets/hiyouga/geometry3k">geo3k dataset</a> with multi-turn reasoning with interactive environment feedback, using GRPO. For the dataset, we used the <a class="reference external" href="https://huggingface.co/datasets/VeraIsHere/geo3k_imgurl_processed">processed version</a>.</p>
456+
<p><strong>Thanks to slime's clean design, multi-turn RL aligns with first principles: with a <a class="reference internal" href="#rollout.py#L309"><span class="xref myst">custom rollout function</span></a>, any training backend (e.g. Megatron/FSDP) can use it.</strong></p>
457457
<p>The multi-turn rollout is implemented through a <a class="reference internal" href="#rollout.py#L309"><span class="xref myst">custom generate function</span></a>, overriding the original generate function.</p>
458458
<p>In terms of the environment interaction, this example initializes a <a class="reference internal" href="#env_geo3k.py"><span class="xref myst">custom interactive environment</span></a> with the APIs below.</p>
459459
<details>

zh/_sources/_examples_synced/geo3k_vlm_multi_turn/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# VLM Multi-Turn (geo3k dataset)
2-
Training VLM on [geo3k dataset](https://huggingface.co/datasets/hiyouga/geometry3k) with multi-turn reasoning with interactive environment feedback, using GRPO. For dataset, we used the [processed version](https://huggingface.co/datasets/VeraIsHere/geo3k_imgurl_processed).
2+
Training VLM on [geo3k dataset](https://huggingface.co/datasets/hiyouga/geometry3k) with multi-turn reasoning with interactive environment feedback, using GRPO. For the dataset, we used the [processed version](https://huggingface.co/datasets/VeraIsHere/geo3k_imgurl_processed).
33

4-
Thanks to Slime's clean design, multi-turn RL aligns with first principles: with a [custom rollout function](rollout.py#L309), any training backend (e.g. FSDP/Megatron) can use it.
4+
**Thanks to slime's clean design, multi-turn RL aligns with first principles: with a [custom rollout function](rollout.py#L309), any training backend (e.g. Megatron/FSDP) can use it.**
55

66
The multi-turn rollout is implemented through a [custom generate function](rollout.py#L309), overriding the original generate function.
77

0 commit comments

Comments
 (0)