You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -518,6 +528,85 @@ <h2>TODO<a class="headerlink" href="#todo" title="Link to this heading">#</a></h
518
528
<li><p>FP8 weights (<codeclass="docutils literal notranslate"><spanclass="pre">--fp8-param-gather</span></code>) can provide memory savings benefits, but currently FP8 weights must be used with TransformerEngine’s FusedAdam, which conflicts with the commonly used Adam CPU offload technique in Megatron-LM.</p></li>
519
529
</ul>
520
530
<p>The slime team will continue to collaborate with the NVIDIA team to contribute more complete FP8 training infrastructure to the community.</p>
531
+
<p>Here is a polished and professional version of your documentation.</p>
532
+
<p>I have corrected grammatical errors, improved the flow, standardizes the terminology (e.g., capitalizing “STE”), and clarified the instructions.</p>
533
+
</section>
534
+
<hrclass="docutils" />
535
+
<sectionid="int4-training-examples">
536
+
<h2>INT4 Training Examples<aclass="headerlink" href="#int4-training-examples" title="Link to this heading">#</a></h2>
537
+
<p>This guide provides examples for INT4 STE (Straight-Through Estimator) training and INT4 inference. Utilizing INT4 inference significantly improves throughput, thereby accelerating the training pipeline (specifically during the rollout generation phase).</p>
538
+
<sectionid="id1">
539
+
<h3>Files<aclass="headerlink" href="#id1" title="Link to this heading">#</a></h3>
540
+
<ulclass="simple">
541
+
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">run-moonlight-16B-A3B-int4.sh</span></code>: Launch script for <strong>Moonlight-16B-A3B</strong> (INT4) on 4x H200 GPUs.</p></li>
542
+
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">run-qwen3‑30B‑A3B-int4.sh</span></code>: Launch script for <strong>Qwen3‑30B‑A3B</strong> (INT4) on 8x H200 GPUs.</p></li>
543
+
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">run-qwen3-235B-A22B-int4.sh</span></code>: Launch script for <strong>Qwen3-235B-A22B</strong> (INT4) on 64x H200 GPUs.</p></li>
544
+
<li><p><codeclass="docutils literal notranslate"><spanclass="pre">run-kimi-k2-Thinking-int4.sh</span></code>: Launch script for <strong>Kimi-k2-Thinking</strong> (INT4) on 256x H200 GPUs.</p></li>
545
+
</ul>
546
+
</section>
547
+
<sectionid="id2">
548
+
<h3>Quick Start<aclass="headerlink" href="#id2" title="Link to this heading">#</a></h3>
549
+
<sectionid="configure-training-arguments">
550
+
<h4>1. Configure Training Arguments<aclass="headerlink" href="#configure-training-arguments" title="Link to this heading">#</a></h4>
551
+
<p>Ensure your training script is properly configured. For training tasks, you must add the following flag to your launch arguments:</p>
<p>Next, use the <codeclass="docutils literal notranslate"><spanclass="pre">tools/convert_hf_to_hf_int4.py</span></code> script to convert BF16 weights to INT4 format. Ensure that the <codeclass="docutils literal notranslate"><spanclass="pre">--hf-checkpoint</span></code> parameter points to a directory where <codeclass="docutils literal notranslate"><spanclass="pre">config.json</span></code> contains the correct <codeclass="docutils literal notranslate"><spanclass="pre">quantization_config</span></code>. Slime will automatically utilize INT4 quantization during weight updates.</p>
<h4>3. Start INT4 Training<aclass="headerlink" href="#start-int4-training" title="Link to this heading">#</a></h4>
570
+
<p>You need to configure the specific environment variables for quantization settings.</p>
571
+
<p><strong>Environment Variables:</strong></p>
572
+
<ulclass="simple">
573
+
<li><p><strong><codeclass="docutils literal notranslate"><spanclass="pre">OPEN_TRAINING_INT4_FAKE_QAT_FLAG</span></code></strong>: Enables fake quantization operations for INT4 training.</p></li>
574
+
<li><p><strong><codeclass="docutils literal notranslate"><spanclass="pre">OPEN_TRAINING_INT4_GROUP_SIZE</span></code></strong>: Specifies the block size (group size) for model quantization.</p>
575
+
<ul>
576
+
<li><p>Set to <strong>128</strong> for <codeclass="docutils literal notranslate"><spanclass="pre">moonlight-16B-A3B</span></code> 、 <codeclass="docutils literal notranslate"><spanclass="pre">qwen3-30B-A3B</span></code>and <codeclass="docutils literal notranslate"><spanclass="pre">qwen3-235B-A22B-int4</span></code>.</p></li>
577
+
<li><p>Set to <strong>32</strong> for <codeclass="docutils literal notranslate"><spanclass="pre">kimi-k2-Thinking-int4</span></code>.</p></li>
Copy file name to clipboardExpand all lines: _sources/_examples_synced/low_precision/README.md
+82-1Lines changed: 82 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -63,4 +63,85 @@ Currently, FP8 is far from being a complete feature and still has the following
63
63
64
64
- FP8 weights (`--fp8-param-gather`) can provide memory savings benefits, but currently FP8 weights must be used with TransformerEngine's FusedAdam, which conflicts with the commonly used Adam CPU offload technique in Megatron-LM.
65
65
66
-
The slime team will continue to collaborate with the NVIDIA team to contribute more complete FP8 training infrastructure to the community.
66
+
The slime team will continue to collaborate with the NVIDIA team to contribute more complete FP8 training infrastructure to the community.
67
+
68
+
69
+
Here is a polished and professional version of your documentation.
70
+
71
+
I have corrected grammatical errors, improved the flow, standardizes the terminology (e.g., capitalizing "STE"), and clarified the instructions.
72
+
73
+
***
74
+
75
+
## INT4 Training Examples
76
+
77
+
This guide provides examples for INT4 STE (Straight-Through Estimator) training and INT4 inference. Utilizing INT4 inference significantly improves throughput, thereby accelerating the training pipeline (specifically during the rollout generation phase).
78
+
79
+
### Files
80
+
81
+
*`run-moonlight-16B-A3B-int4.sh`: Launch script for **Moonlight-16B-A3B** (INT4) on 4x H200 GPUs.
82
+
*`run-qwen3‑30B‑A3B-int4.sh`: Launch script for **Qwen3‑30B‑A3B** (INT4) on 8x H200 GPUs.
83
+
*`run-qwen3-235B-A22B-int4.sh`: Launch script for **Qwen3-235B-A22B** (INT4) on 64x H200 GPUs.
84
+
*`run-kimi-k2-Thinking-int4.sh`: Launch script for **Kimi-k2-Thinking** (INT4) on 256x H200 GPUs.
85
+
86
+
### Quick Start
87
+
88
+
#### 1. Configure Training Arguments
89
+
Ensure your training script is properly configured. For training tasks, you must add the following flag to your launch arguments:
90
+
91
+
```bash
92
+
--int4-params-rollout
93
+
```
94
+
95
+
#### 2. Convert HuggingFace Weights to INT4
96
+
First, download the PTQ (Post-Training Quantization) calibration dataset from HuggingFace:
Next, use the `tools/convert_hf_to_hf_int4.py` script to convert BF16 weights to INT4 format. Ensure that the `--hf-checkpoint` parameter points to a directory where `config.json` contains the correct `quantization_config`. Slime will automatically utilize INT4 quantization during weight updates.
100
+
101
+
```bash
102
+
python tools/convert_hf_to_hf_int4.py \
103
+
--model_id /path/to/your/original/models \
104
+
--output_dir /path/to/your/save/models \
105
+
--local_data_path /path/to/your/wikitext
106
+
```
107
+
108
+
#### 3. Start INT4 Training
109
+
110
+
You need to configure the specific environment variables for quantization settings.
111
+
112
+
**Environment Variables:**
113
+
114
+
***`OPEN_TRAINING_INT4_FAKE_QAT_FLAG`**: Enables fake quantization operations for INT4 training.
115
+
***`OPEN_TRAINING_INT4_GROUP_SIZE`**: Specifies the block size (group size) for model quantization.
116
+
* Set to **128** for `moonlight-16B-A3B` 、 `qwen3-30B-A3B`and `qwen3-235B-A22B-int4`.
0 commit comments