You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is an example of FP8 training and FP8 inference. Under FP8 training and inference, it can achieve more efficient inference throughput and lower training-inference mismatch, resulting in more stable training. More details can be found in [this blog](https://lmsys.org/blog/2025-11-25-fp8-rl/).
3
+
-[FP8 rollout and FP8 training](#FP8-rollout-and-FP8-training)
4
+
-[FP8 rollout and FP8 training](#FP8-rollout-and-FP8-training)
5
+
-[INT4 QAT Training](#INT4-QAT-Training)
4
6
5
-
## Files
7
+
## FP8 rollout and BF16 training
6
8
7
-
*`run-qwen3-4b-fp8.sh`: example launch script with Qwen3‑4B in FP8.
9
+
You can run FP8 rollout simply by setting `--hf-checkpoint` with an blockwise quantized huggingface checkpoint, which can be converted by:
8
10
9
-
*`run-qwen3-30b-a3b-fp8-two-nodes.sh`: example launch script for running Qwen3‑30B‑A3B in FP8 across two nodes.
11
+
```bash
12
+
python tools/convert_hf_to_fp8.py \
13
+
--model-dir $BF16_MODEL \
14
+
--save-dir $FP8_model \
15
+
--strategy block --block-size 128 128 \
16
+
--max-workers 4
17
+
```
18
+
19
+
Please ensure that the converted checkpoint points to a directory where the `config.json` contains the correct `quantization_config` so that slime can automatically use FP8 quantization during weight updates.
20
+
21
+
## FP8 rollout and FP8 training
22
+
23
+
We also observed that under FP8 training and inference, it can achieve more efficient inference throughput and lower training-inference mismatch, resulting in more stable training. More details can be found in [this blog](https://lmsys.org/blog/2025-11-25-fp8-rl/).
10
24
11
-
## Quick Start
25
+
### Quick Start
26
+
27
+
1. Convert your HuggingFace model weights to FP8 format using the above `tools/convert_hf_to_fp8.py`.
12
28
13
-
1. Check if your training script is properly configured.
29
+
2. Setting up the running script:
14
30
15
31
For training tasks, we need to add these flags:
32
+
16
33
```bash
17
34
--fp8-format e4m3
18
35
--fp8-recipe blockwise
19
36
# --fp8-param-gather # [optional] Currently incompatible with CPU Adam
20
37
```
38
+
21
39
Then ensure the `NVTE_FP8_BLOCK_SCALING_FP32_SCALES` environment variable is enabled.
22
40
23
41
Note that only `Linear` and `GroupLinear` layers in TransformerEngine use fp8 format. `embedding` and `lm_head` remain in their original precision. If `--fp8-param-gather` is not enabled, weights in TransformerEngine remain in bf16 format, only being cast to fp8 format during `GEMM` or `GroupGEMM` operations.
24
42
25
-
2. Convert your HuggingFace model weights to FP8 format.
26
-
27
-
You can use `tools/convert_hf_to_fp8.py` to convert bf16 weights to fp8 format. Ensure that the `--hf-checkpoint` parameter points to a directory where the `config.json` contains the correct `quantization_config`. slime will automatically use FP8 quantization during weight updates.
Following the above command will launch FP8 training.
41
52
42
53
4. Use the saved checkpoint for evaluation.
43
54
44
55
Note that TransformerEngine does not specifically save FP8 quantized weights; the saved torch dist remains in original precision (usually bf16). If you want to evaluate under FP8, you need to convert the checkpoint from `torch_dist` to HuggingFace format, then convert to FP8 HuggingFace format.
45
56
46
57
47
-
## Quick Explanation
58
+
###Quick Explanation
48
59
49
60
Here's a quick explanation of how FP8 training is currently implemented in slime:
50
61
@@ -57,43 +68,34 @@ Here's a quick explanation of how FP8 training is currently implemented in slime
57
68
4. Save checkpoint: Similar to weight updates, if checkpoints need to be saved from the training engine, they will also be dequantized back to bf16 and saved to `torch_dist` format checkpoints.
58
69
59
70
60
-
## TODO
71
+
###TODO
61
72
62
73
Currently, FP8 is far from being a complete feature and still has the following bugs, for examples:
63
74
64
75
- FP8 weights (`--fp8-param-gather`) can provide memory savings benefits, but currently FP8 weights must be used with TransformerEngine's FusedAdam, which conflicts with the commonly used Adam CPU offload technique in Megatron-LM.
65
76
66
-
The slime team will continue to collaborate with the NVIDIA team to contribute more complete FP8 training infrastructure to the community.
67
-
68
-
***
69
-
70
-
## INT4 Training Examples
77
+
## INT4 QAT Training
71
78
72
79
This guide provides examples for INT4 STE (Straight-Through Estimator) training and INT4 inference. Utilizing INT4 inference significantly improves throughput, thereby accelerating the training pipeline (specifically during the rollout generation phase).
73
80
74
-
### Files
75
-
76
-
*`run-moonlight-16B-A3B-int4.sh`: Launch script for **Moonlight-16B-A3B** (INT4) on 4x H200 GPUs.
77
-
*`run-qwen3‑30B‑A3B-int4.sh`: Launch script for **Qwen3‑30B‑A3B** (INT4) on 8x H200 GPUs.
78
-
*`run-qwen3-235B-A22B-int4.sh`: Launch script for **Qwen3-235B-A22B** (INT4) on 64x H200 GPUs.
79
-
*`run-kimi-k2-Thinking-int4.sh`: Launch script for **Kimi-k2-Thinking** (INT4) on 256x H200 GPUs.
80
-
81
81
### Quick Start
82
82
83
-
#### 1. Convert HuggingFace Weights to INT4
83
+
1. Convert HuggingFace Weights to INT4
84
84
First, download the PTQ (Post-Training Quantization) calibration dataset from HuggingFace:
Next, use the `tools/convert_hf_to_hf_int4.py` script to convert BF16 weights to INT4 format. Ensure that the `--hf-checkpoint` parameter points to a directory where `config.json` contains the correct `quantization_config`. slime will automatically utilize INT4 quantization during weight updates.
87
+
Next, use the `tools/convert_hf_to_int4.py` script to convert BF16 weights to INT4 format. Ensure that the `--hf-checkpoint` parameter points to a directory where `config.json` contains the correct `quantization_config`. slime will automatically utilize INT4 quantization during weight updates.
88
88
89
89
```bash
90
-
python tools/convert_hf_to_hf_int4.py \
90
+
python tools/convert_hf_to_int4.py \
91
91
--input-dir /path/to/your/original/models \
92
92
--output-dir /path/to/your/save/models \
93
93
--data-dir /path/to/your/wikitext
94
94
```
95
95
96
-
#### 2. Start INT4 Training
96
+
Note: If you only hope to run with INT4 rollout, you only need to set the `--hf-checkpoint` to the converted INT4 checkpoint.
97
+
98
+
2. Start INT4 QAT Training
97
99
98
100
You need to configure the specific environment variables for quantization settings.
0 commit comments