Skip to content

Commit 333c781

Browse files
committed
deploy: a770ee1
1 parent 2878090 commit 333c781

File tree

12 files changed

+20
-12
lines changed

12 files changed

+20
-12
lines changed

.doctrees/environment.pickle

0 Bytes
Binary file not shown.
798 Bytes
Binary file not shown.

_downloads/2d9502563ea3824049a204e1f0b564a2/run-qwen3-30B-A3B.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,8 @@ CKPT_ARGS=(
3030
--hf-checkpoint /root/Qwen3-30B-A3B
3131
#--hf-checkpoint /root/Qwen3-30B-A3B-FP8
3232
--ref-load /root/Qwen3-30B-A3B_torch_dist
33-
--load /root/Qwen3-4B_slime/
34-
--save /root/Qwen3-4B_slime/
33+
--load /root/Qwen3-30B-A3B_slime/
34+
--save /root/Qwen3-30B-A3B_slime/
3535
--save-interval 20
3636
)
3737

_sources/get_started/usage.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -150,7 +150,8 @@ Currently, slime only supports loading files in `.jsonl` format, where each line
150150
"prompt": [
151151
{
152152
"content": "Solve the following math problem step by step. The last line of your response should be of the form Answer: \\boxed{$Answer} where $Answer is the answer to the problem.\n\nIn triangle $ABC$, $\\sin \\angle A = \\frac{4}{5}$ and $\\angle A < 90^\\circ$. Let $D$ be a point outside triangle $ABC$ such that $\\angle BAD = \\angle DAC$ and $\\angle BDC = 90^\\circ$. Suppose that $AD = 1$ and that $\\frac{BD}{CD} = \\frac{3}{2}$. If $AB + AC$ can be expressed in the form $\\frac{a\\sqrt{b}}{c}$ where $a, b, c$ are pairwise relatively prime integers, find $a + b + c$.\n\nRemember to put your answer on its own line after \"Answer:\".",
153-
"role": "user"
153+
"role": "user",
154+
"step_loss_mask": 1,
154155
}
155156
],
156157
"label": "34"
@@ -165,6 +166,7 @@ This corresponds to the following configuration:
165166
--apply-chat-template
166167
```
167168

169+
Please note that the `step_loss_mask` (default=1) here is for SFT phase. If it is set to 0, the turn will not contibute to the final loss; if it is set to 1, slime will use the normal `loss_mask`.
168170
Additionally, we provide a `metadata_key`, which defaults to `"metadata"`. When read, slime will load the metadata from the data, which can be helpful for custom data generation or creating custom reward models.
169171

170172
### Hyperparameters for RL Training

get_started/usage.html

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -616,7 +616,8 @@ <h3>Data Format<a class="headerlink" href="#data-format" title="Link to this hea
616616
<span class="w"> </span><span class="nt">&quot;prompt&quot;</span><span class="p">:</span><span class="w"> </span><span class="p">[</span>
617617
<span class="w"> </span><span class="p">{</span>
618618
<span class="w"> </span><span class="nt">&quot;content&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;Solve the following math problem step by step. The last line of your response should be of the form Answer: \\boxed{$Answer} where $Answer is the answer to the problem.\n\nIn triangle $ABC$, $\\sin \\angle A = \\frac{4}{5}$ and $\\angle A &lt; 90^\\circ$. Let $D$ be a point outside triangle $ABC$ such that $\\angle BAD = \\angle DAC$ and $\\angle BDC = 90^\\circ$. Suppose that $AD = 1$ and that $\\frac{BD}{CD} = \\frac{3}{2}$. If $AB + AC$ can be expressed in the form $\\frac{a\\sqrt{b}}{c}$ where $a, b, c$ are pairwise relatively prime integers, find $a + b + c$.\n\nRemember to put your answer on its own line after \&quot;Answer:\&quot;.&quot;</span><span class="p">,</span>
619-
<span class="w"> </span><span class="nt">&quot;role&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;user&quot;</span>
619+
<span class="w"> </span><span class="nt">&quot;role&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;user&quot;</span><span class="p">,</span>
620+
<span class="w"> </span><span class="nt">&quot;step_loss_mask&quot;</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span>
620621
<span class="w"> </span><span class="p">}</span>
621622
<span class="w"> </span><span class="p">],</span>
622623
<span class="w"> </span><span class="nt">&quot;label&quot;</span><span class="p">:</span><span class="w"> </span><span class="s2">&quot;34&quot;</span>
@@ -629,7 +630,8 @@ <h3>Data Format<a class="headerlink" href="#data-format" title="Link to this hea
629630
<span class="w"> </span>--apply-chat-template
630631
</pre></div>
631632
</div>
632-
<p>Additionally, we provide a <code class="docutils literal notranslate"><span class="pre">metadata_key</span></code>, which defaults to <code class="docutils literal notranslate"><span class="pre">&quot;metadata&quot;</span></code>. When read, slime will load the metadata from the data, which can be helpful for custom data generation or creating custom reward models.</p>
633+
<p>Please note that the <code class="docutils literal notranslate"><span class="pre">step_loss_mask</span></code> (default=1) here is for SFT phase. If it is set to 0, the turn will not contibute to the final loss; if it is set to 1, slime will use the normal <code class="docutils literal notranslate"><span class="pre">loss_mask</span></code>.
634+
Additionally, we provide a <code class="docutils literal notranslate"><span class="pre">metadata_key</span></code>, which defaults to <code class="docutils literal notranslate"><span class="pre">&quot;metadata&quot;</span></code>. When read, slime will load the metadata from the data, which can be helpful for custom data generation or creating custom reward models.</p>
633635
</section>
634636
<section id="hyperparameters-for-rl-training">
635637
<h3>Hyperparameters for RL Training<a class="headerlink" href="#hyperparameters-for-rl-training" title="Link to this heading">#</a></h3>

searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

zh/.doctrees/environment.pickle

0 Bytes
Binary file not shown.
907 Bytes
Binary file not shown.

zh/_downloads/2d9502563ea3824049a204e1f0b564a2/run-qwen3-30B-A3B.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,8 @@ CKPT_ARGS=(
3030
--hf-checkpoint /root/Qwen3-30B-A3B
3131
#--hf-checkpoint /root/Qwen3-30B-A3B-FP8
3232
--ref-load /root/Qwen3-30B-A3B_torch_dist
33-
--load /root/Qwen3-4B_slime/
34-
--save /root/Qwen3-4B_slime/
33+
--load /root/Qwen3-30B-A3B_slime/
34+
--save /root/Qwen3-30B-A3B_slime/
3535
--save-interval 20
3636
)
3737

zh/_sources/get_started/usage.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,8 @@ sglang 的加载非常简单,只需要:
154154
"prompt": [
155155
{
156156
"content": "Solve the following math problem step by step. The last line of your response should be of the form Answer: \\boxed{$Answer} where $Answer is the answer to the problem.\n\nIn triangle $ABC$, $\\sin \\angle A = \\frac{4}{5}$ and $\\angle A < 90^\\circ$. Let $D$ be a point outside triangle $ABC$ such that $\\angle BAD = \\angle DAC$ and $\\angle BDC = 90^\\circ$. Suppose that $AD = 1$ and that $\\frac{BD}{CD} = \\frac{3}{2}$. If $AB + AC$ can be expressed in the form $\\frac{a\\sqrt{b}}{c}$ where $a, b, c$ are pairwise relatively prime integers, find $a + b + c$.\n\nRemember to put your answer on its own line after \"Answer:\".",
157-
"role": "user"
157+
"role": "user",
158+
"step_loss_mask": 1,
158159
}
159160
],
160161
"label": "34"
@@ -169,6 +170,7 @@ sglang 的加载非常简单,只需要:
169170
--apply-chat-template
170171
```
171172

173+
请注意,这里的 `step_loss_mask`(默认值为 1)字段为 SFT 阶段提供,若设置为 0,则会将该轮 `loss_mask` 设置为 0;若设置为 1,则使用正常 `loss_mask`
172174
另外我们还提供了一个 metadata_key,默认为 `"metadata"`,读取后我们会把数据中的 metadata 加载进 slime,可能会对自定义数据生成或者自定义 reward model 有帮助。
173175

174176
### RL 训练需要的超参

0 commit comments

Comments
 (0)