Skip to content

Commit 99d7064

Browse files
committed
deploy: 9457895
1 parent 78bc3da commit 99d7064

File tree

12 files changed

+36
-26
lines changed

12 files changed

+36
-26
lines changed
1.31 KB
Binary file not shown.

.doctrees/environment.pickle

-38 Bytes
Binary file not shown.

_examples_synced/geo3k_vlm_multi_turn/README.html

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
<meta charset="utf-8" />
99
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
1010

11-
<title>VLM Multi-Turn (FSDP backend, geo3k dataset) &#8212; slime</title>
11+
<title>VLM Multi-Turn (geo3k dataset) &#8212; slime</title>
1212

1313

1414

@@ -431,7 +431,7 @@
431431

432432

433433
<div id="jb-print-docs-body" class="onlyprint">
434-
<h1>VLM Multi-Turn (FSDP backend, geo3k dataset)</h1>
434+
<h1>VLM Multi-Turn (geo3k dataset)</h1>
435435
<!-- Table of contents -->
436436
<div id="print-main-content">
437437
<div id="jb-print-toc">
@@ -454,11 +454,12 @@ <h2> Contents </h2>
454454
<div id="searchbox"></div>
455455
<article class="bd-article">
456456

457-
<section class="tex2jax_ignore mathjax_ignore" id="vlm-multi-turn-fsdp-backend-geo3k-dataset">
458-
<h1>VLM Multi-Turn (FSDP backend, geo3k dataset)<a class="headerlink" href="#vlm-multi-turn-fsdp-backend-geo3k-dataset" title="Link to this heading">#</a></h1>
459-
<p>Training VLM with FSDP on <a class="reference external" href="https://huggingface.co/datasets/hiyouga/geometry3k">geo3k dataset</a> with multi-turn reasoning with interactive environment feedback, using GRPO. For dataset, we used the <a class="reference external" href="https://huggingface.co/datasets/VeraIsHere/geo3k_imgurl_processed">processed version</a>.</p>
460-
<p>The multi-turn rollout is implemented through a custom generate function <code class="docutils literal notranslate"><span class="pre">examples.geo3k_vlm_multi_turn.rollout.generate</span></code>, overriding the original generate function.</p>
461-
<p>In terms of the environment interaction, this example initializes a custom interactive environment in <code class="docutils literal notranslate"><span class="pre">examples/geo3k_vlm_multi_turn/env_geo3k.py</span></code> with the APIs below.</p>
457+
<section class="tex2jax_ignore mathjax_ignore" id="vlm-multi-turn-geo3k-dataset">
458+
<h1>VLM Multi-Turn (geo3k dataset)<a class="headerlink" href="#vlm-multi-turn-geo3k-dataset" title="Link to this heading">#</a></h1>
459+
<p>Training VLM on <a class="reference external" href="https://huggingface.co/datasets/hiyouga/geometry3k">geo3k dataset</a> with multi-turn reasoning with interactive environment feedback, using GRPO. For dataset, we used the <a class="reference external" href="https://huggingface.co/datasets/VeraIsHere/geo3k_imgurl_processed">processed version</a>.</p>
460+
<p>Thanks to Slime’s clean design, multi-turn RL aligns with first principles: with a <a class="reference internal" href="#rollout.py#L309"><span class="xref myst">custom rollout function</span></a>, any training backend (e.g. FSDP/Megatron) can use it.</p>
461+
<p>The multi-turn rollout is implemented through a <a class="reference internal" href="#rollout.py#L309"><span class="xref myst">custom generate function</span></a>, overriding the original generate function.</p>
462+
<p>In terms of the environment interaction, this example initializes a <a class="reference internal" href="#env_geo3k.py"><span class="xref myst">custom interactive environment</span></a> with the APIs below.</p>
462463
<details>
463464
<summary>Environment API (geo3k)</summary>
464465
<ul class="simple">
@@ -469,7 +470,8 @@ <h1>VLM Multi-Turn (FSDP backend, geo3k dataset)<a class="headerlink" href="#vlm
469470
</ul>
470471
</details><br>
471472
<p>The reward model is the default math RM.</p>
472-
<p><img alt="VLM multi-turn geo3k reward" src="_examples_synced/geo3k_vlm_multi_turn/geo3k_vlm_multi_turn_reward.png" /></p>
473+
<p><img alt="VLM multi-turn geo3k reward" src="_examples_synced/geo3k_vlm_multi_turn/geo3k_vlm_multi_turn_reward.png" />
474+
<img alt="Rollout megatron" src="_examples_synced/geo3k_vlm_multi_turn/rollout_experiment_result_megatron.png" /></p>
473475
<section id="reproduce">
474476
<h2>Reproduce<a class="headerlink" href="#reproduce" title="Link to this heading">#</a></h2>
475477
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># 1) Set environment variable</span>

_sources/_examples_synced/geo3k_vlm_multi_turn/README.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
1-
# VLM Multi-Turn (FSDP backend, geo3k dataset)
2-
Training VLM with FSDP on [geo3k dataset](https://huggingface.co/datasets/hiyouga/geometry3k) with multi-turn reasoning with interactive environment feedback, using GRPO. For dataset, we used the [processed version](https://huggingface.co/datasets/VeraIsHere/geo3k_imgurl_processed).
1+
# VLM Multi-Turn (geo3k dataset)
2+
Training VLM on [geo3k dataset](https://huggingface.co/datasets/hiyouga/geometry3k) with multi-turn reasoning with interactive environment feedback, using GRPO. For dataset, we used the [processed version](https://huggingface.co/datasets/VeraIsHere/geo3k_imgurl_processed).
33

4-
The multi-turn rollout is implemented through a custom generate function `examples.geo3k_vlm_multi_turn.rollout.generate`, overriding the original generate function.
4+
Thanks to Slime's clean design, multi-turn RL aligns with first principles: with a [custom rollout function](rollout.py#L309), any training backend (e.g. FSDP/Megatron) can use it.
55

6-
In terms of the environment interaction, this example initializes a custom interactive environment in `examples/geo3k_vlm_multi_turn/env_geo3k.py` with the APIs below.
6+
The multi-turn rollout is implemented through a [custom generate function](rollout.py#L309), overriding the original generate function.
7+
8+
In terms of the environment interaction, this example initializes a [custom interactive environment](env_geo3k.py) with the APIs below.
79
<details>
810
<summary>Environment API (geo3k)</summary>
911

@@ -17,6 +19,7 @@ In terms of the environment interaction, this example initializes a custom inter
1719
The reward model is the default math RM.
1820

1921
![VLM multi-turn geo3k reward](geo3k_vlm_multi_turn_reward.png)
22+
![Rollout megatron](rollout_experiment_result_megatron.png)
2023

2124
## Reproduce
2225
```bash

objects.inv

-25 Bytes
Binary file not shown.

searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Binary file not shown.

zh/.doctrees/environment.pickle

-38 Bytes
Binary file not shown.

zh/_examples_synced/geo3k_vlm_multi_turn/README.html

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
<meta charset="utf-8" />
99
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
1010

11-
<title>VLM Multi-Turn (FSDP backend, geo3k dataset) &#8212; slime</title>
11+
<title>VLM Multi-Turn (geo3k dataset) &#8212; slime</title>
1212

1313

1414

@@ -427,7 +427,7 @@
427427

428428

429429
<div id="jb-print-docs-body" class="onlyprint">
430-
<h1>VLM Multi-Turn (FSDP backend, geo3k dataset)</h1>
430+
<h1>VLM Multi-Turn (geo3k dataset)</h1>
431431
<!-- Table of contents -->
432432
<div id="print-main-content">
433433
<div id="jb-print-toc">
@@ -450,11 +450,12 @@ <h2> 目录 </h2>
450450
<div id="searchbox"></div>
451451
<article class="bd-article">
452452

453-
<section class="tex2jax_ignore mathjax_ignore" id="vlm-multi-turn-fsdp-backend-geo3k-dataset">
454-
<h1>VLM Multi-Turn (FSDP backend, geo3k dataset)<a class="headerlink" href="#vlm-multi-turn-fsdp-backend-geo3k-dataset" title="Link to this heading">#</a></h1>
455-
<p>Training VLM with FSDP on <a class="reference external" href="https://huggingface.co/datasets/hiyouga/geometry3k">geo3k dataset</a> with multi-turn reasoning with interactive environment feedback, using GRPO. For dataset, we used the <a class="reference external" href="https://huggingface.co/datasets/VeraIsHere/geo3k_imgurl_processed">processed version</a>.</p>
456-
<p>The multi-turn rollout is implemented through a custom generate function <code class="docutils literal notranslate"><span class="pre">examples.geo3k_vlm_multi_turn.rollout.generate</span></code>, overriding the original generate function.</p>
457-
<p>In terms of the environment interaction, this example initializes a custom interactive environment in <code class="docutils literal notranslate"><span class="pre">examples/geo3k_vlm_multi_turn/env_geo3k.py</span></code> with the APIs below.</p>
453+
<section class="tex2jax_ignore mathjax_ignore" id="vlm-multi-turn-geo3k-dataset">
454+
<h1>VLM Multi-Turn (geo3k dataset)<a class="headerlink" href="#vlm-multi-turn-geo3k-dataset" title="Link to this heading">#</a></h1>
455+
<p>Training VLM on <a class="reference external" href="https://huggingface.co/datasets/hiyouga/geometry3k">geo3k dataset</a> with multi-turn reasoning with interactive environment feedback, using GRPO. For dataset, we used the <a class="reference external" href="https://huggingface.co/datasets/VeraIsHere/geo3k_imgurl_processed">processed version</a>.</p>
456+
<p>Thanks to Slime's clean design, multi-turn RL aligns with first principles: with a <a class="reference internal" href="#rollout.py#L309"><span class="xref myst">custom rollout function</span></a>, any training backend (e.g. FSDP/Megatron) can use it.</p>
457+
<p>The multi-turn rollout is implemented through a <a class="reference internal" href="#rollout.py#L309"><span class="xref myst">custom generate function</span></a>, overriding the original generate function.</p>
458+
<p>In terms of the environment interaction, this example initializes a <a class="reference internal" href="#env_geo3k.py"><span class="xref myst">custom interactive environment</span></a> with the APIs below.</p>
458459
<details>
459460
<summary>Environment API (geo3k)</summary>
460461
<ul class="simple">
@@ -465,7 +466,8 @@ <h1>VLM Multi-Turn (FSDP backend, geo3k dataset)<a class="headerlink" href="#vlm
465466
</ul>
466467
</details><br>
467468
<p>The reward model is the default math RM.</p>
468-
<p><img alt="VLM multi-turn geo3k reward" src="_examples_synced/geo3k_vlm_multi_turn/geo3k_vlm_multi_turn_reward.png" /></p>
469+
<p><img alt="VLM multi-turn geo3k reward" src="_examples_synced/geo3k_vlm_multi_turn/geo3k_vlm_multi_turn_reward.png" />
470+
<img alt="Rollout megatron" src="_examples_synced/geo3k_vlm_multi_turn/rollout_experiment_result_megatron.png" /></p>
469471
<section id="reproduce">
470472
<h2>Reproduce<a class="headerlink" href="#reproduce" title="Link to this heading">#</a></h2>
471473
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># 1) Set environment variable</span>

zh/_sources/_examples_synced/geo3k_vlm_multi_turn/README.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
1-
# VLM Multi-Turn (FSDP backend, geo3k dataset)
2-
Training VLM with FSDP on [geo3k dataset](https://huggingface.co/datasets/hiyouga/geometry3k) with multi-turn reasoning with interactive environment feedback, using GRPO. For dataset, we used the [processed version](https://huggingface.co/datasets/VeraIsHere/geo3k_imgurl_processed).
1+
# VLM Multi-Turn (geo3k dataset)
2+
Training VLM on [geo3k dataset](https://huggingface.co/datasets/hiyouga/geometry3k) with multi-turn reasoning with interactive environment feedback, using GRPO. For dataset, we used the [processed version](https://huggingface.co/datasets/VeraIsHere/geo3k_imgurl_processed).
33

4-
The multi-turn rollout is implemented through a custom generate function `examples.geo3k_vlm_multi_turn.rollout.generate`, overriding the original generate function.
4+
Thanks to Slime's clean design, multi-turn RL aligns with first principles: with a [custom rollout function](rollout.py#L309), any training backend (e.g. FSDP/Megatron) can use it.
55

6-
In terms of the environment interaction, this example initializes a custom interactive environment in `examples/geo3k_vlm_multi_turn/env_geo3k.py` with the APIs below.
6+
The multi-turn rollout is implemented through a [custom generate function](rollout.py#L309), overriding the original generate function.
7+
8+
In terms of the environment interaction, this example initializes a [custom interactive environment](env_geo3k.py) with the APIs below.
79
<details>
810
<summary>Environment API (geo3k)</summary>
911

@@ -17,6 +19,7 @@ In terms of the environment interaction, this example initializes a custom inter
1719
The reward model is the default math RM.
1820

1921
![VLM multi-turn geo3k reward](geo3k_vlm_multi_turn_reward.png)
22+
![Rollout megatron](rollout_experiment_result_megatron.png)
2023

2124
## Reproduce
2225
```bash

0 commit comments

Comments
 (0)