Skip to content

Commit cd6c315

Browse files
committed
deploy: 16c2e01
1 parent 1e65e17 commit cd6c315

File tree

96 files changed

+1786
-386
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

96 files changed

+1786
-386
lines changed
42 Bytes
Binary file not shown.
26.4 KB
Binary file not shown.

.doctrees/environment.pickle

7.43 KB
Binary file not shown.

.doctrees/index.doctree

-15 Bytes
Binary file not shown.

_examples_synced/eval_multi_task/README.html

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,7 @@
183183
<p aria-level="2" class="caption" role="heading"><span class="caption-text">Advanced Features</span></p>
184184
<ul class="nav bd-sidenav">
185185
<li class="toctree-l1"><a class="reference internal" href="../../advanced/slime-router.html">Slime Router</a></li>
186+
<li class="toctree-l1"><a class="reference internal" href="../../advanced/on-policy-distillation.html">On-Policy Distillation</a></li>
186187
<li class="toctree-l1"><a class="reference internal" href="../../advanced/speculative-decoding.html">Speculative Decoding</a></li>
187188
<li class="toctree-l1"><a class="reference internal" href="../../advanced/low-precision.html">Low Precision Training</a></li>
188189
<li class="toctree-l1"><a class="reference internal" href="../../advanced/reproducibility.html">Reproducibility</a></li>
@@ -197,10 +198,6 @@
197198
<li class="toctree-l1"><a class="reference internal" href="../fully_async/README.html">Fully Asynchronous Rollout Example</a></li>
198199
<li class="toctree-l1"><a class="reference internal" href="../retool/README.html">Retool: from SFT to RL</a></li>
199200
<li class="toctree-l1"><a class="reference internal" href="../multi_agent/README.html">Multi-Agent RL</a></li>
200-
<li class="toctree-l1"><a class="reference internal" href="../on_policy_distillation/README.html">On-Policy Distillation Example</a></li>
201-
202-
203-
204201
</ul>
205202
<p aria-level="2" class="caption" role="heading"><span class="caption-text">Developer Guide</span></p>
206203
<ul class="nav bd-sidenav">

_examples_synced/fully_async/README.html

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,7 @@
183183
<p aria-level="2" class="caption" role="heading"><span class="caption-text">Advanced Features</span></p>
184184
<ul class="nav bd-sidenav">
185185
<li class="toctree-l1"><a class="reference internal" href="../../advanced/slime-router.html">Slime Router</a></li>
186+
<li class="toctree-l1"><a class="reference internal" href="../../advanced/on-policy-distillation.html">On-Policy Distillation</a></li>
186187
<li class="toctree-l1"><a class="reference internal" href="../../advanced/speculative-decoding.html">Speculative Decoding</a></li>
187188
<li class="toctree-l1"><a class="reference internal" href="../../advanced/low-precision.html">Low Precision Training</a></li>
188189
<li class="toctree-l1"><a class="reference internal" href="../../advanced/reproducibility.html">Reproducibility</a></li>
@@ -197,10 +198,6 @@
197198
<li class="toctree-l1 current active"><a class="current reference internal" href="#">Fully Asynchronous Rollout Example</a></li>
198199
<li class="toctree-l1"><a class="reference internal" href="../retool/README.html">Retool: from SFT to RL</a></li>
199200
<li class="toctree-l1"><a class="reference internal" href="../multi_agent/README.html">Multi-Agent RL</a></li>
200-
<li class="toctree-l1"><a class="reference internal" href="../on_policy_distillation/README.html">On-Policy Distillation Example</a></li>
201-
202-
203-
204201
</ul>
205202
<p aria-level="2" class="caption" role="heading"><span class="caption-text">Developer Guide</span></p>
206203
<ul class="nav bd-sidenav">

_examples_synced/geo3k_vlm/README.html

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,7 @@
183183
<p aria-level="2" class="caption" role="heading"><span class="caption-text">Advanced Features</span></p>
184184
<ul class="nav bd-sidenav">
185185
<li class="toctree-l1"><a class="reference internal" href="../../advanced/slime-router.html">Slime Router</a></li>
186+
<li class="toctree-l1"><a class="reference internal" href="../../advanced/on-policy-distillation.html">On-Policy Distillation</a></li>
186187
<li class="toctree-l1"><a class="reference internal" href="../../advanced/speculative-decoding.html">Speculative Decoding</a></li>
187188
<li class="toctree-l1"><a class="reference internal" href="../../advanced/low-precision.html">Low Precision Training</a></li>
188189
<li class="toctree-l1"><a class="reference internal" href="../../advanced/reproducibility.html">Reproducibility</a></li>
@@ -197,10 +198,6 @@
197198
<li class="toctree-l1"><a class="reference internal" href="../fully_async/README.html">Fully Asynchronous Rollout Example</a></li>
198199
<li class="toctree-l1"><a class="reference internal" href="../retool/README.html">Retool: from SFT to RL</a></li>
199200
<li class="toctree-l1"><a class="reference internal" href="../multi_agent/README.html">Multi-Agent RL</a></li>
200-
<li class="toctree-l1"><a class="reference internal" href="../on_policy_distillation/README.html">On-Policy Distillation Example</a></li>
201-
202-
203-
204201
</ul>
205202
<p aria-level="2" class="caption" role="heading"><span class="caption-text">Developer Guide</span></p>
206203
<ul class="nav bd-sidenav">

_examples_synced/geo3k_vlm_multi_turn/README.html

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,7 @@
183183
<p aria-level="2" class="caption" role="heading"><span class="caption-text">Advanced Features</span></p>
184184
<ul class="nav bd-sidenav">
185185
<li class="toctree-l1"><a class="reference internal" href="../../advanced/slime-router.html">Slime Router</a></li>
186+
<li class="toctree-l1"><a class="reference internal" href="../../advanced/on-policy-distillation.html">On-Policy Distillation</a></li>
186187
<li class="toctree-l1"><a class="reference internal" href="../../advanced/speculative-decoding.html">Speculative Decoding</a></li>
187188
<li class="toctree-l1"><a class="reference internal" href="../../advanced/low-precision.html">Low Precision Training</a></li>
188189
<li class="toctree-l1"><a class="reference internal" href="../../advanced/reproducibility.html">Reproducibility</a></li>
@@ -197,10 +198,6 @@
197198
<li class="toctree-l1"><a class="reference internal" href="../fully_async/README.html">Fully Asynchronous Rollout Example</a></li>
198199
<li class="toctree-l1"><a class="reference internal" href="../retool/README.html">Retool: from SFT to RL</a></li>
199200
<li class="toctree-l1"><a class="reference internal" href="../multi_agent/README.html">Multi-Agent RL</a></li>
200-
<li class="toctree-l1"><a class="reference internal" href="../on_policy_distillation/README.html">On-Policy Distillation Example</a></li>
201-
202-
203-
204201
</ul>
205202
<p aria-level="2" class="caption" role="heading"><span class="caption-text">Developer Guide</span></p>
206203
<ul class="nav bd-sidenav">

_examples_synced/multi_agent/README.html

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@
5050
<link rel="icon" href="../../_static/logo.ico"/>
5151
<link rel="index" title="Index" href="../../genindex.html" />
5252
<link rel="search" title="Search" href="../../search.html" />
53-
<link rel="next" title="On-Policy Distillation Example" href="../on_policy_distillation/README.html" />
53+
<link rel="next" title="Debugging" href="../../developer_guide/debug.html" />
5454
<link rel="prev" title="Retool: from SFT to RL" href="../retool/README.html" />
5555
<meta name="viewport" content="width=device-width, initial-scale=1"/>
5656
<meta name="docsearch:language" content="en"/>
@@ -183,6 +183,7 @@
183183
<p aria-level="2" class="caption" role="heading"><span class="caption-text">Advanced Features</span></p>
184184
<ul class="nav bd-sidenav">
185185
<li class="toctree-l1"><a class="reference internal" href="../../advanced/slime-router.html">Slime Router</a></li>
186+
<li class="toctree-l1"><a class="reference internal" href="../../advanced/on-policy-distillation.html">On-Policy Distillation</a></li>
186187
<li class="toctree-l1"><a class="reference internal" href="../../advanced/speculative-decoding.html">Speculative Decoding</a></li>
187188
<li class="toctree-l1"><a class="reference internal" href="../../advanced/low-precision.html">Low Precision Training</a></li>
188189
<li class="toctree-l1"><a class="reference internal" href="../../advanced/reproducibility.html">Reproducibility</a></li>
@@ -197,10 +198,6 @@
197198
<li class="toctree-l1"><a class="reference internal" href="../fully_async/README.html">Fully Asynchronous Rollout Example</a></li>
198199
<li class="toctree-l1"><a class="reference internal" href="../retool/README.html">Retool: from SFT to RL</a></li>
199200
<li class="toctree-l1 current active"><a class="current reference internal" href="#">Multi-Agent RL</a></li>
200-
<li class="toctree-l1"><a class="reference internal" href="../on_policy_distillation/README.html">On-Policy Distillation Example</a></li>
201-
202-
203-
204201
</ul>
205202
<p aria-level="2" class="caption" role="heading"><span class="caption-text">Developer Guide</span></p>
206203
<ul class="nav bd-sidenav">
@@ -532,11 +529,11 @@ <h2>New Arguments<a class="headerlink" href="#new-arguments" title="Link to this
532529
</div>
533530
</a>
534531
<a class="right-next"
535-
href="../on_policy_distillation/README.html"
532+
href="../../developer_guide/debug.html"
536533
title="next page">
537534
<div class="prev-next-info">
538535
<p class="prev-next-subtitle">next</p>
539-
<p class="prev-next-title">On-Policy Distillation Example</p>
536+
<p class="prev-next-title">Debugging</p>
540537
</div>
541538
<i class="fa-solid fa-angle-right"></i>
542539
</a>

_examples_synced/on_policy_distillation/README.html

Lines changed: 5 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -50,8 +50,6 @@
5050
<link rel="icon" href="../../_static/logo.ico"/>
5151
<link rel="index" title="Index" href="../../genindex.html" />
5252
<link rel="search" title="Search" href="../../search.html" />
53-
<link rel="next" title="Debugging" href="../../developer_guide/debug.html" />
54-
<link rel="prev" title="Multi-Agent RL" href="../multi_agent/README.html" />
5553
<meta name="viewport" content="width=device-width, initial-scale=1"/>
5654
<meta name="docsearch:language" content="en"/>
5755
<meta name="docbuild:last-update" content="Feb 21, 2026"/>
@@ -115,6 +113,8 @@
115113

116114

117115

116+
117+
118118
<div class="bd-sidebar-primary bd-sidebar">
119119

120120

@@ -183,6 +183,7 @@
183183
<p aria-level="2" class="caption" role="heading"><span class="caption-text">Advanced Features</span></p>
184184
<ul class="nav bd-sidenav">
185185
<li class="toctree-l1"><a class="reference internal" href="../../advanced/slime-router.html">Slime Router</a></li>
186+
<li class="toctree-l1"><a class="reference internal" href="../../advanced/on-policy-distillation.html">On-Policy Distillation</a></li>
186187
<li class="toctree-l1"><a class="reference internal" href="../../advanced/speculative-decoding.html">Speculative Decoding</a></li>
187188
<li class="toctree-l1"><a class="reference internal" href="../../advanced/low-precision.html">Low Precision Training</a></li>
188189
<li class="toctree-l1"><a class="reference internal" href="../../advanced/reproducibility.html">Reproducibility</a></li>
@@ -191,16 +192,12 @@
191192
<li class="toctree-l1"><a class="reference internal" href="../../advanced/arch-support-beyond-megatron.html">Supporting Model Architectures Beyond Megatron-LM</a></li>
192193
</ul>
193194
<p aria-level="2" class="caption" role="heading"><span class="caption-text">Other Usage</span></p>
194-
<ul class="current nav bd-sidenav">
195+
<ul class="nav bd-sidenav">
195196
<li class="toctree-l1"><a class="reference internal" href="../../examples/qwen3-4b-base-openhermes.html">SFT Qwen3-4B-Base</a></li>
196197
<li class="toctree-l1"><a class="reference internal" href="../search-r1/README.html">Search-R1 lite</a></li>
197198
<li class="toctree-l1"><a class="reference internal" href="../fully_async/README.html">Fully Asynchronous Rollout Example</a></li>
198199
<li class="toctree-l1"><a class="reference internal" href="../retool/README.html">Retool: from SFT to RL</a></li>
199200
<li class="toctree-l1"><a class="reference internal" href="../multi_agent/README.html">Multi-Agent RL</a></li>
200-
<li class="toctree-l1 current active"><a class="current reference internal" href="#">On-Policy Distillation Example</a></li>
201-
202-
203-
204201
</ul>
205202
<p aria-level="2" class="caption" role="heading"><span class="caption-text">Developer Guide</span></p>
206203
<ul class="nav bd-sidenav">
@@ -539,7 +536,7 @@ <h2>Mode Comparison<a class="headerlink" href="#mode-comparison" title="Link to
539536
<section id="components">
540537
<h2>Components<a class="headerlink" href="#components" title="Link to this heading">#</a></h2>
541538
<ul class="simple">
542-
<li><p><code class="docutils literal notranslate"><span class="pre">on_policy_distillation.py</span></code> implements (for SGLang mode):</p>
539+
<li><p><code class="docutils literal notranslate"><span class="pre">slime/rollout/on_policy_distillation.py</span></code> implements (for SGLang mode):</p>
543540
<ul>
544541
<li><p><code class="docutils literal notranslate"><span class="pre">reward_func</span></code> calls the teacher server (via <code class="docutils literal notranslate"><span class="pre">args.rm_url</span></code>) with every sample to obtain token-level logprobs.</p></li>
545542
<li><p><code class="docutils literal notranslate"><span class="pre">post_process_rewards</span></code> trims the teacher logprobs to the generated response span and writes the tensors back to each <code class="docutils literal notranslate"><span class="pre">Sample</span></code> to compute advantages.</p></li>
@@ -680,24 +677,6 @@ <h1>References<a class="headerlink" href="#references" title="Link to this headi
680677
<footer class="prev-next-footer d-print-none">
681678

682679
<div class="prev-next-area">
683-
<a class="left-prev"
684-
href="../multi_agent/README.html"
685-
title="previous page">
686-
<i class="fa-solid fa-angle-left"></i>
687-
<div class="prev-next-info">
688-
<p class="prev-next-subtitle">previous</p>
689-
<p class="prev-next-title">Multi-Agent RL</p>
690-
</div>
691-
</a>
692-
<a class="right-next"
693-
href="../../developer_guide/debug.html"
694-
title="next page">
695-
<div class="prev-next-info">
696-
<p class="prev-next-subtitle">next</p>
697-
<p class="prev-next-title">Debugging</p>
698-
</div>
699-
<i class="fa-solid fa-angle-right"></i>
700-
</a>
701680
</div>
702681
</footer>
703682

0 commit comments

Comments
 (0)