You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<h1>VLM Multi-Turn (FSDP backend, geo3k dataset)<aclass="headerlink" href="#vlm-multi-turn-fsdp-backend-geo3k-dataset" title="Link to this heading">#</a></h1>
459
-
<p>Training VLM with FSDP on <aclass="reference external" href="https://huggingface.co/datasets/hiyouga/geometry3k">geo3k dataset</a> with multi-turn reasoning with interactive environment feedback, using GRPO. For dataset, we used the <aclass="reference external" href="https://huggingface.co/datasets/VeraIsHere/geo3k_imgurl_processed">processed version</a>.</p>
460
-
<p>The multi-turn rollout is implemented through a custom generate function <codeclass="docutils literal notranslate"><spanclass="pre">examples.geo3k_vlm_multi_turn.rollout.generate</span></code>, overriding the original generate function.</p>
461
-
<p>In terms of the environment interaction, this example initializes a custom interactive environment in <codeclass="docutils literal notranslate"><spanclass="pre">examples/geo3k_vlm_multi_turn/env_geo3k.py</span></code> with the APIs below.</p>
<h1>VLM Multi-Turn (geo3k dataset)<aclass="headerlink" href="#vlm-multi-turn-geo3k-dataset" title="Link to this heading">#</a></h1>
459
+
<p>Training VLM on <aclass="reference external" href="https://huggingface.co/datasets/hiyouga/geometry3k">geo3k dataset</a> with multi-turn reasoning with interactive environment feedback, using GRPO. For dataset, we used the <aclass="reference external" href="https://huggingface.co/datasets/VeraIsHere/geo3k_imgurl_processed">processed version</a>.</p>
460
+
<p>Thanks to Slime’s clean design, multi-turn RL aligns with first principles: with a <aclass="reference internal" href="#rollout.py#L309"><spanclass="xref myst">custom rollout function</span></a>, any training backend (e.g. FSDP/Megatron) can use it.</p>
461
+
<p>The multi-turn rollout is implemented through a <aclass="reference internal" href="#rollout.py#L309"><spanclass="xref myst">custom generate function</span></a>, overriding the original generate function.</p>
462
+
<p>In terms of the environment interaction, this example initializes a <aclass="reference internal" href="#env_geo3k.py"><spanclass="xref myst">custom interactive environment</span></a> with the APIs below.</p>
Copy file name to clipboardExpand all lines: _sources/_examples_synced/geo3k_vlm_multi_turn/README.md
+7-4Lines changed: 7 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,9 +1,11 @@
1
-
# VLM Multi-Turn (FSDP backend, geo3k dataset)
2
-
Training VLM with FSDP on [geo3k dataset](https://huggingface.co/datasets/hiyouga/geometry3k) with multi-turn reasoning with interactive environment feedback, using GRPO. For dataset, we used the [processed version](https://huggingface.co/datasets/VeraIsHere/geo3k_imgurl_processed).
1
+
# VLM Multi-Turn (geo3k dataset)
2
+
Training VLM on [geo3k dataset](https://huggingface.co/datasets/hiyouga/geometry3k) with multi-turn reasoning with interactive environment feedback, using GRPO. For dataset, we used the [processed version](https://huggingface.co/datasets/VeraIsHere/geo3k_imgurl_processed).
3
3
4
-
The multi-turn rollout is implemented through a custom generate function`examples.geo3k_vlm_multi_turn.rollout.generate`, overriding the original generate function.
4
+
Thanks to Slime's clean design, multi-turn RL aligns with first principles: with a [custom rollout function](rollout.py#L309), any training backend (e.g. FSDP/Megatron) can use it.
5
5
6
-
In terms of the environment interaction, this example initializes a custom interactive environment in `examples/geo3k_vlm_multi_turn/env_geo3k.py` with the APIs below.
6
+
The multi-turn rollout is implemented through a [custom generate function](rollout.py#L309), overriding the original generate function.
7
+
8
+
In terms of the environment interaction, this example initializes a [custom interactive environment](env_geo3k.py) with the APIs below.
7
9
<details>
8
10
<summary>Environment API (geo3k)</summary>
9
11
@@ -17,6 +19,7 @@ In terms of the environment interaction, this example initializes a custom inter
<h1>VLM Multi-Turn (FSDP backend, geo3k dataset)<aclass="headerlink" href="#vlm-multi-turn-fsdp-backend-geo3k-dataset" title="Link to this heading">#</a></h1>
455
-
<p>Training VLM with FSDP on <aclass="reference external" href="https://huggingface.co/datasets/hiyouga/geometry3k">geo3k dataset</a> with multi-turn reasoning with interactive environment feedback, using GRPO. For dataset, we used the <aclass="reference external" href="https://huggingface.co/datasets/VeraIsHere/geo3k_imgurl_processed">processed version</a>.</p>
456
-
<p>The multi-turn rollout is implemented through a custom generate function <codeclass="docutils literal notranslate"><spanclass="pre">examples.geo3k_vlm_multi_turn.rollout.generate</span></code>, overriding the original generate function.</p>
457
-
<p>In terms of the environment interaction, this example initializes a custom interactive environment in <codeclass="docutils literal notranslate"><spanclass="pre">examples/geo3k_vlm_multi_turn/env_geo3k.py</span></code> with the APIs below.</p>
<h1>VLM Multi-Turn (geo3k dataset)<aclass="headerlink" href="#vlm-multi-turn-geo3k-dataset" title="Link to this heading">#</a></h1>
455
+
<p>Training VLM on <aclass="reference external" href="https://huggingface.co/datasets/hiyouga/geometry3k">geo3k dataset</a> with multi-turn reasoning with interactive environment feedback, using GRPO. For dataset, we used the <aclass="reference external" href="https://huggingface.co/datasets/VeraIsHere/geo3k_imgurl_processed">processed version</a>.</p>
456
+
<p>Thanks to Slime's clean design, multi-turn RL aligns with first principles: with a <aclass="reference internal" href="#rollout.py#L309"><spanclass="xref myst">custom rollout function</span></a>, any training backend (e.g. FSDP/Megatron) can use it.</p>
457
+
<p>The multi-turn rollout is implemented through a <aclass="reference internal" href="#rollout.py#L309"><spanclass="xref myst">custom generate function</span></a>, overriding the original generate function.</p>
458
+
<p>In terms of the environment interaction, this example initializes a <aclass="reference internal" href="#env_geo3k.py"><spanclass="xref myst">custom interactive environment</span></a> with the APIs below.</p>
Copy file name to clipboardExpand all lines: zh/_sources/_examples_synced/geo3k_vlm_multi_turn/README.md
+7-4Lines changed: 7 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,9 +1,11 @@
1
-
# VLM Multi-Turn (FSDP backend, geo3k dataset)
2
-
Training VLM with FSDP on [geo3k dataset](https://huggingface.co/datasets/hiyouga/geometry3k) with multi-turn reasoning with interactive environment feedback, using GRPO. For dataset, we used the [processed version](https://huggingface.co/datasets/VeraIsHere/geo3k_imgurl_processed).
1
+
# VLM Multi-Turn (geo3k dataset)
2
+
Training VLM on [geo3k dataset](https://huggingface.co/datasets/hiyouga/geometry3k) with multi-turn reasoning with interactive environment feedback, using GRPO. For dataset, we used the [processed version](https://huggingface.co/datasets/VeraIsHere/geo3k_imgurl_processed).
3
3
4
-
The multi-turn rollout is implemented through a custom generate function`examples.geo3k_vlm_multi_turn.rollout.generate`, overriding the original generate function.
4
+
Thanks to Slime's clean design, multi-turn RL aligns with first principles: with a [custom rollout function](rollout.py#L309), any training backend (e.g. FSDP/Megatron) can use it.
5
5
6
-
In terms of the environment interaction, this example initializes a custom interactive environment in `examples/geo3k_vlm_multi_turn/env_geo3k.py` with the APIs below.
6
+
The multi-turn rollout is implemented through a [custom generate function](rollout.py#L309), overriding the original generate function.
7
+
8
+
In terms of the environment interaction, this example initializes a [custom interactive environment](env_geo3k.py) with the APIs below.
7
9
<details>
8
10
<summary>Environment API (geo3k)</summary>
9
11
@@ -17,6 +19,7 @@ In terms of the environment interaction, this example initializes a custom inter
0 commit comments