You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _sources/get_started/usage.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -150,7 +150,8 @@ Currently, slime only supports loading files in `.jsonl` format, where each line
150
150
"prompt": [
151
151
{
152
152
"content": "Solve the following math problem step by step. The last line of your response should be of the form Answer: \\boxed{$Answer} where $Answer is the answer to the problem.\n\nIn triangle $ABC$, $\\sin \\angle A = \\frac{4}{5}$ and $\\angle A < 90^\\circ$. Let $D$ be a point outside triangle $ABC$ such that $\\angle BAD = \\angle DAC$ and $\\angle BDC = 90^\\circ$. Suppose that $AD = 1$ and that $\\frac{BD}{CD} = \\frac{3}{2}$. If $AB + AC$ can be expressed in the form $\\frac{a\\sqrt{b}}{c}$ where $a, b, c$ are pairwise relatively prime integers, find $a + b + c$.\n\nRemember to put your answer on its own line after \"Answer:\".",
153
-
"role": "user"
153
+
"role": "user",
154
+
"step_loss_mask": 1,
154
155
}
155
156
],
156
157
"label": "34"
@@ -165,6 +166,7 @@ This corresponds to the following configuration:
165
166
--apply-chat-template
166
167
```
167
168
169
+
Please note that the `step_loss_mask` (default=1) here is for SFT phase. If it is set to 0, the turn will not contibute to the final loss; if it is set to 1, slime will use the normal `loss_mask`.
168
170
Additionally, we provide a `metadata_key`, which defaults to `"metadata"`. When read, slime will load the metadata from the data, which can be helpful for custom data generation or creating custom reward models.
<spanclass="w"></span><spanclass="nt">"content"</span><spanclass="p">:</span><spanclass="w"></span><spanclass="s2">"Solve the following math problem step by step. The last line of your response should be of the form Answer: \\boxed{$Answer} where $Answer is the answer to the problem.\n\nIn triangle $ABC$, $\\sin \\angle A = \\frac{4}{5}$ and $\\angle A < 90^\\circ$. Let $D$ be a point outside triangle $ABC$ such that $\\angle BAD = \\angle DAC$ and $\\angle BDC = 90^\\circ$. Suppose that $AD = 1$ and that $\\frac{BD}{CD} = \\frac{3}{2}$. If $AB + AC$ can be expressed in the form $\\frac{a\\sqrt{b}}{c}$ where $a, b, c$ are pairwise relatively prime integers, find $a + b + c$.\n\nRemember to put your answer on its own line after \"Answer:\"."</span><spanclass="p">,</span>
@@ -629,7 +630,8 @@ <h3>Data Format<a class="headerlink" href="#data-format" title="Link to this hea
629
630
<spanclass="w"></span>--apply-chat-template
630
631
</pre></div>
631
632
</div>
632
-
<p>Additionally, we provide a <codeclass="docutils literal notranslate"><spanclass="pre">metadata_key</span></code>, which defaults to <codeclass="docutils literal notranslate"><spanclass="pre">"metadata"</span></code>. When read, slime will load the metadata from the data, which can be helpful for custom data generation or creating custom reward models.</p>
633
+
<p>Please note that the <codeclass="docutils literal notranslate"><spanclass="pre">step_loss_mask</span></code> (default=1) here is for SFT phase. If it is set to 0, the turn will not contibute to the final loss; if it is set to 1, slime will use the normal <codeclass="docutils literal notranslate"><spanclass="pre">loss_mask</span></code>.
634
+
Additionally, we provide a <codeclass="docutils literal notranslate"><spanclass="pre">metadata_key</span></code>, which defaults to <codeclass="docutils literal notranslate"><spanclass="pre">"metadata"</span></code>. When read, slime will load the metadata from the data, which can be helpful for custom data generation or creating custom reward models.</p>
633
635
</section>
634
636
<sectionid="hyperparameters-for-rl-training">
635
637
<h3>Hyperparameters for RL Training<aclass="headerlink" href="#hyperparameters-for-rl-training" title="Link to this heading">#</a></h3>
Copy file name to clipboardExpand all lines: zh/_sources/get_started/usage.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -154,7 +154,8 @@ sglang 的加载非常简单,只需要:
154
154
"prompt": [
155
155
{
156
156
"content": "Solve the following math problem step by step. The last line of your response should be of the form Answer: \\boxed{$Answer} where $Answer is the answer to the problem.\n\nIn triangle $ABC$, $\\sin \\angle A = \\frac{4}{5}$ and $\\angle A < 90^\\circ$. Let $D$ be a point outside triangle $ABC$ such that $\\angle BAD = \\angle DAC$ and $\\angle BDC = 90^\\circ$. Suppose that $AD = 1$ and that $\\frac{BD}{CD} = \\frac{3}{2}$. If $AB + AC$ can be expressed in the form $\\frac{a\\sqrt{b}}{c}$ where $a, b, c$ are pairwise relatively prime integers, find $a + b + c$.\n\nRemember to put your answer on its own line after \"Answer:\".",
0 commit comments