Skip to content

Commit 27d00d1

Browse files
committed
deploy: d2ec7ca
1 parent 183cadf commit 27d00d1

File tree

12 files changed

+364
-4
lines changed

12 files changed

+364
-4
lines changed
14.6 KB
Binary file not shown.

.doctrees/environment.pickle

4.07 KB
Binary file not shown.

_examples_synced/low_precision/README.html

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -445,6 +445,16 @@ <h2> Contents </h2>
445445
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#quick-start">Quick Start</a></li>
446446
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#quick-explanation">Quick Explanation</a></li>
447447
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#todo">TODO</a></li>
448+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#int4-training-examples">INT4 Training Examples</a><ul class="visible nav section-nav flex-column">
449+
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#id1">Files</a></li>
450+
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#id2">Quick Start</a><ul class="nav section-nav flex-column">
451+
<li class="toc-h4 nav-item toc-entry"><a class="reference internal nav-link" href="#configure-training-arguments">1. Configure Training Arguments</a></li>
452+
<li class="toc-h4 nav-item toc-entry"><a class="reference internal nav-link" href="#convert-huggingface-weights-to-int4">2. Convert HuggingFace Weights to INT4</a></li>
453+
<li class="toc-h4 nav-item toc-entry"><a class="reference internal nav-link" href="#start-int4-training">3. Start INT4 Training</a></li>
454+
</ul>
455+
</li>
456+
</ul>
457+
</li>
448458
</ul>
449459
</nav>
450460
</div>
@@ -518,6 +528,85 @@ <h2>TODO<a class="headerlink" href="#todo" title="Link to this heading">#</a></h
518528
<li><p>FP8 weights (<code class="docutils literal notranslate"><span class="pre">--fp8-param-gather</span></code>) can provide memory savings benefits, but currently FP8 weights must be used with TransformerEngine’s FusedAdam, which conflicts with the commonly used Adam CPU offload technique in Megatron-LM.</p></li>
519529
</ul>
520530
<p>The slime team will continue to collaborate with the NVIDIA team to contribute more complete FP8 training infrastructure to the community.</p>
531+
<p>Here is a polished and professional version of your documentation.</p>
532+
<p>I have corrected grammatical errors, improved the flow, standardizes the terminology (e.g., capitalizing “STE”), and clarified the instructions.</p>
533+
</section>
534+
<hr class="docutils" />
535+
<section id="int4-training-examples">
536+
<h2>INT4 Training Examples<a class="headerlink" href="#int4-training-examples" title="Link to this heading">#</a></h2>
537+
<p>This guide provides examples for INT4 STE (Straight-Through Estimator) training and INT4 inference. Utilizing INT4 inference significantly improves throughput, thereby accelerating the training pipeline (specifically during the rollout generation phase).</p>
538+
<section id="id1">
539+
<h3>Files<a class="headerlink" href="#id1" title="Link to this heading">#</a></h3>
540+
<ul class="simple">
541+
<li><p><code class="docutils literal notranslate"><span class="pre">run-moonlight-16B-A3B-int4.sh</span></code>: Launch script for <strong>Moonlight-16B-A3B</strong> (INT4) on 4x H200 GPUs.</p></li>
542+
<li><p><code class="docutils literal notranslate"><span class="pre">run-qwen3‑30B‑A3B-int4.sh</span></code>: Launch script for <strong>Qwen3‑30B‑A3B</strong> (INT4) on 8x H200 GPUs.</p></li>
543+
<li><p><code class="docutils literal notranslate"><span class="pre">run-qwen3-235B-A22B-int4.sh</span></code>: Launch script for <strong>Qwen3-235B-A22B</strong> (INT4) on 64x H200 GPUs.</p></li>
544+
<li><p><code class="docutils literal notranslate"><span class="pre">run-kimi-k2-Thinking-int4.sh</span></code>: Launch script for <strong>Kimi-k2-Thinking</strong> (INT4) on 256x H200 GPUs.</p></li>
545+
</ul>
546+
</section>
547+
<section id="id2">
548+
<h3>Quick Start<a class="headerlink" href="#id2" title="Link to this heading">#</a></h3>
549+
<section id="configure-training-arguments">
550+
<h4>1. Configure Training Arguments<a class="headerlink" href="#configure-training-arguments" title="Link to this heading">#</a></h4>
551+
<p>Ensure your training script is properly configured. For training tasks, you must add the following flag to your launch arguments:</p>
552+
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>--int4-params-rollout
553+
</pre></div>
554+
</div>
555+
</section>
556+
<section id="convert-huggingface-weights-to-int4">
557+
<h4>2. Convert HuggingFace Weights to INT4<a class="headerlink" href="#convert-huggingface-weights-to-int4" title="Link to this heading">#</a></h4>
558+
<p>First, download the PTQ (Post-Training Quantization) calibration dataset from HuggingFace:
559+
<a class="reference external" href="https://huggingface.co/datasets/Salesforce/wikitext/tree/main/wikitext-2-raw-v1">https://huggingface.co/datasets/Salesforce/wikitext/tree/main/wikitext-2-raw-v1</a></p>
560+
<p>Next, use the <code class="docutils literal notranslate"><span class="pre">tools/convert_hf_to_hf_int4.py</span></code> script to convert BF16 weights to INT4 format. Ensure that the <code class="docutils literal notranslate"><span class="pre">--hf-checkpoint</span></code> parameter points to a directory where <code class="docutils literal notranslate"><span class="pre">config.json</span></code> contains the correct <code class="docutils literal notranslate"><span class="pre">quantization_config</span></code>. Slime will automatically utilize INT4 quantization during weight updates.</p>
561+
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python<span class="w"> </span>tools/convert_hf_to_hf_int4.py<span class="w"> </span><span class="se">\</span>
562+
<span class="w"> </span>--model_id<span class="w"> </span>/path/to/your/original/models<span class="w"> </span><span class="se">\</span>
563+
<span class="w"> </span>--output_dir<span class="w"> </span>/path/to/your/save/models<span class="w"> </span><span class="se">\</span>
564+
<span class="w"> </span>--local_data_path<span class="w"> </span>/path/to/your/wikitext
565+
</pre></div>
566+
</div>
567+
</section>
568+
<section id="start-int4-training">
569+
<h4>3. Start INT4 Training<a class="headerlink" href="#start-int4-training" title="Link to this heading">#</a></h4>
570+
<p>You need to configure the specific environment variables for quantization settings.</p>
571+
<p><strong>Environment Variables:</strong></p>
572+
<ul class="simple">
573+
<li><p><strong><code class="docutils literal notranslate"><span class="pre">OPEN_TRAINING_INT4_FAKE_QAT_FLAG</span></code></strong>: Enables fake quantization operations for INT4 training.</p></li>
574+
<li><p><strong><code class="docutils literal notranslate"><span class="pre">OPEN_TRAINING_INT4_GROUP_SIZE</span></code></strong>: Specifies the block size (group size) for model quantization.</p>
575+
<ul>
576+
<li><p>Set to <strong>128</strong> for <code class="docutils literal notranslate"><span class="pre">moonlight-16B-A3B</span></code><code class="docutils literal notranslate"><span class="pre">qwen3-30B-A3B</span></code>and <code class="docutils literal notranslate"><span class="pre">qwen3-235B-A22B-int4</span></code>.</p></li>
577+
<li><p>Set to <strong>32</strong> for <code class="docutils literal notranslate"><span class="pre">kimi-k2-Thinking-int4</span></code>.</p></li>
578+
</ul>
579+
</li>
580+
</ul>
581+
<p><strong>Configuration Example:</strong></p>
582+
<div class="highlight-json notranslate"><div class="highlight"><pre><span></span><span class="err">RUNTIME_ENV_JSON=</span><span class="s2">&quot;{</span>
583+
<span class="s2"> \&quot;env_vars\&quot;: {</span>
584+
<span class="s2"> ...</span>
585+
<span class="s2"> \&quot;OPEN_TRAINING_INT4_FAKE_QAT_FLAG\&quot;: \&quot;1\&quot;,</span>
586+
<span class="s2"> \&quot;OPEN_TRAINING_INT4_GROUP_SIZE\&quot;: \&quot;128\&quot;</span>
587+
<span class="s2"> }</span>
588+
<span class="s2">}&quot;</span>
589+
</pre></div>
590+
</div>
591+
<p><strong>Launch Commands:</strong></p>
592+
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># Moonlight-16B-A3B Int4 training</span>
593+
bash<span class="w"> </span>examples/low_precision/run-moonlight-16B-A3B-int4.sh
594+
595+
<span class="c1"># Qwen3‑30B‑A3B Int4 training</span>
596+
bash<span class="w"> </span>examples/low_precision/run-qwen3‑30B‑A3B-int4.sh
597+
598+
<span class="c1"># Qwen3-235B-A22B Int4 training (8 nodes)</span>
599+
bash<span class="w"> </span>examples/low_precision/run-qwen3-235B-A22B-int4.sh
600+
601+
<span class="c1"># Kimi-k2-Thinking Int4 training (32 nodes)</span>
602+
bash<span class="w"> </span>examples/low_precision/run-kimi-k2-Thinking-int4.sh
603+
</pre></div>
604+
</div>
605+
<ul class="simple">
606+
<li><p>For multi-node environments, please start the Ray service according to your cluster configuration.</p></li>
607+
</ul>
608+
</section>
609+
</section>
521610
</section>
522611
</section>
523612

@@ -552,6 +641,16 @@ <h2>TODO<a class="headerlink" href="#todo" title="Link to this heading">#</a></h
552641
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#quick-start">Quick Start</a></li>
553642
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#quick-explanation">Quick Explanation</a></li>
554643
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#todo">TODO</a></li>
644+
<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#int4-training-examples">INT4 Training Examples</a><ul class="visible nav section-nav flex-column">
645+
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#id1">Files</a></li>
646+
<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#id2">Quick Start</a><ul class="nav section-nav flex-column">
647+
<li class="toc-h4 nav-item toc-entry"><a class="reference internal nav-link" href="#configure-training-arguments">1. Configure Training Arguments</a></li>
648+
<li class="toc-h4 nav-item toc-entry"><a class="reference internal nav-link" href="#convert-huggingface-weights-to-int4">2. Convert HuggingFace Weights to INT4</a></li>
649+
<li class="toc-h4 nav-item toc-entry"><a class="reference internal nav-link" href="#start-int4-training">3. Start INT4 Training</a></li>
650+
</ul>
651+
</li>
652+
</ul>
653+
</li>
555654
</ul>
556655
</nav></div>
557656

_sources/_examples_synced/low_precision/README.md

Lines changed: 82 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,4 +63,85 @@ Currently, FP8 is far from being a complete feature and still has the following
6363

6464
- FP8 weights (`--fp8-param-gather`) can provide memory savings benefits, but currently FP8 weights must be used with TransformerEngine's FusedAdam, which conflicts with the commonly used Adam CPU offload technique in Megatron-LM.
6565

66-
The slime team will continue to collaborate with the NVIDIA team to contribute more complete FP8 training infrastructure to the community.
66+
The slime team will continue to collaborate with the NVIDIA team to contribute more complete FP8 training infrastructure to the community.
67+
68+
69+
Here is a polished and professional version of your documentation.
70+
71+
I have corrected grammatical errors, improved the flow, standardizes the terminology (e.g., capitalizing "STE"), and clarified the instructions.
72+
73+
***
74+
75+
## INT4 Training Examples
76+
77+
This guide provides examples for INT4 STE (Straight-Through Estimator) training and INT4 inference. Utilizing INT4 inference significantly improves throughput, thereby accelerating the training pipeline (specifically during the rollout generation phase).
78+
79+
### Files
80+
81+
* `run-moonlight-16B-A3B-int4.sh`: Launch script for **Moonlight-16B-A3B** (INT4) on 4x H200 GPUs.
82+
* `run-qwen3‑30B‑A3B-int4.sh`: Launch script for **Qwen3‑30B‑A3B** (INT4) on 8x H200 GPUs.
83+
* `run-qwen3-235B-A22B-int4.sh`: Launch script for **Qwen3-235B-A22B** (INT4) on 64x H200 GPUs.
84+
* `run-kimi-k2-Thinking-int4.sh`: Launch script for **Kimi-k2-Thinking** (INT4) on 256x H200 GPUs.
85+
86+
### Quick Start
87+
88+
#### 1. Configure Training Arguments
89+
Ensure your training script is properly configured. For training tasks, you must add the following flag to your launch arguments:
90+
91+
```bash
92+
--int4-params-rollout
93+
```
94+
95+
#### 2. Convert HuggingFace Weights to INT4
96+
First, download the PTQ (Post-Training Quantization) calibration dataset from HuggingFace:
97+
[https://huggingface.co/datasets/Salesforce/wikitext/tree/main/wikitext-2-raw-v1](https://huggingface.co/datasets/Salesforce/wikitext/tree/main/wikitext-2-raw-v1)
98+
99+
Next, use the `tools/convert_hf_to_hf_int4.py` script to convert BF16 weights to INT4 format. Ensure that the `--hf-checkpoint` parameter points to a directory where `config.json` contains the correct `quantization_config`. Slime will automatically utilize INT4 quantization during weight updates.
100+
101+
```bash
102+
python tools/convert_hf_to_hf_int4.py \
103+
--model_id /path/to/your/original/models \
104+
--output_dir /path/to/your/save/models \
105+
--local_data_path /path/to/your/wikitext
106+
```
107+
108+
#### 3. Start INT4 Training
109+
110+
You need to configure the specific environment variables for quantization settings.
111+
112+
**Environment Variables:**
113+
114+
* **`OPEN_TRAINING_INT4_FAKE_QAT_FLAG`**: Enables fake quantization operations for INT4 training.
115+
* **`OPEN_TRAINING_INT4_GROUP_SIZE`**: Specifies the block size (group size) for model quantization.
116+
* Set to **128** for `moonlight-16B-A3B``qwen3-30B-A3B`and `qwen3-235B-A22B-int4`.
117+
* Set to **32** for `kimi-k2-Thinking-int4`.
118+
119+
**Configuration Example:**
120+
121+
```json
122+
RUNTIME_ENV_JSON="{
123+
\"env_vars\": {
124+
...
125+
\"OPEN_TRAINING_INT4_FAKE_QAT_FLAG\": \"1\",
126+
\"OPEN_TRAINING_INT4_GROUP_SIZE\": \"128\"
127+
}
128+
}"
129+
```
130+
131+
**Launch Commands:**
132+
133+
```bash
134+
# Moonlight-16B-A3B Int4 training
135+
bash examples/low_precision/run-moonlight-16B-A3B-int4.sh
136+
137+
# Qwen3‑30B‑A3B Int4 training
138+
bash examples/low_precision/run-qwen3‑30B‑A3B-int4.sh
139+
140+
# Qwen3-235B-A22B Int4 training (8 nodes)
141+
bash examples/low_precision/run-qwen3-235B-A22B-int4.sh
142+
143+
# Kimi-k2-Thinking Int4 training (32 nodes)
144+
bash examples/low_precision/run-kimi-k2-Thinking-int4.sh
145+
```
146+
147+
- For multi-node environments, please start the Ray service according to your cluster configuration.

objects.inv

262 Bytes
Binary file not shown.

searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
14.5 KB
Binary file not shown.

zh/.doctrees/environment.pickle

4.07 KB
Binary file not shown.

0 commit comments

Comments
 (0)