NVIDIA · pstjohn · Mar 11, 2026 · Mar 11, 2026 · jomitchellnv · Mar 11, 2026
@@ -142,8 +142,7 @@ You can also mix FP8 and FP4 layers by providing both recipes and a mixed `layer
 
 When `use_quantized_model_init=True` is set in the config, layers are created inside a
 `te.quantized_model_init` context. This tells TransformerEngine to initialize weights directly in
-the target quantized format, avoiding a separate quantization step after initialization. This is
-primarily useful when loading pre-quantized checkpoints.
+the target quantized format, avoiding a separate quantization step after initialization.
 
 ```python
 config = NVEsmConfig.from_pretrained(

@@ -125,8 +125,7 @@ You can also mix FP8 and FP4 layers by providing both recipes and a mixed `layer
 
 When `use_quantized_model_init=True` is set in the config, layers are created inside a
 `te.quantized_model_init` context. This tells TransformerEngine to initialize weights directly in
-the target quantized format, avoiding a separate quantization step after initialization. This is
-primarily useful when loading pre-quantized checkpoints.
+the target quantized format, avoiding a separate quantization step after initialization.
 
 ```python
 config = NVMixtralConfig(

@@ -158,8 +158,7 @@ The same pattern applies to Qwen2.5 models using `NVQwen2Config` and `NVQwen2For
 
 When `use_quantized_model_init=True` is set in the config, layers are created inside a
 `te.quantized_model_init` context. This tells TransformerEngine to initialize weights directly in
-the target quantized format, avoiding a separate quantization step after initialization. This is
-primarily useful when loading pre-quantized checkpoints.
+the target quantized format, avoiding a separate quantization step after initialization.
 
 ```python
 config = NVQwen3Config.from_pretrained(

@@ -170,6 +170,26 @@ When both `fp8_config` and `fp4_config` are enabled but only one layer list is p
 claims the remaining layers. For example, if `fp8_layers=[1,2,3]` is set and `fp4_config.enabled=true` with no
 `fp4_layers`, then layers 4 through N will default to FP4.
 
+#### Quantized Model Initialization
+
+When training with FP8 or FP4, you can initialize model weights directly in the target quantized format by setting
+`config_kwargs.use_quantized_model_init=true`. This tells TransformerEngine to create weights inside a
+`te.quantized_model_init` context, avoiding a separate quantization step after initialization.
+
+```bash
+python train_fsdp2.py --config-name L0_sanity \
+  fp8_config.enabled=true \
+  +config_kwargs.use_quantized_model_init=true
+```
+
+This also works with NVFP4:
+
+```bash
+python train_fsdp2.py --config-name L0_sanity \
+  fp4_config.enabled=true \
+  +config_kwargs.use_quantized_model_init=true
+```
+
 #### Quantization Stats Debugging
 
 We provide a mechanism to log tensor statistics (activations, weights, gradients) for quantized layers during training.

@@ -14,7 +14,7 @@ cp_size: 1
 
 use_sequence_packing: false
 dataset:
-  tokenizer_name: ???
+  tokenizer_name: ${config_name_or_path}
   micro_batch_size: ???
   num_workers: 1
   max_seq_length: 1024

@@ -136,6 +136,18 @@ configuration parameters, including switching to `MXFP8BlockScaling`, can be set
 python train_fsdp2.py --config-name L0_sanity fp8_config.enabled=true
 ```
 
+#### Quantized Model Initialization
+
+When training with FP8, you can initialize model weights directly in the target quantized format by setting
+`config_kwargs.use_quantized_model_init=true`. This tells TransformerEngine to create weights inside a
+`te.quantized_model_init` context, avoiding a separate quantization step after initialization.
+
+```bash
+python train_fsdp2.py --config-name L0_sanity \
+  fp8_config.enabled=true \
+  +config_kwargs.use_quantized_model_init=true
+```
+
 #### FP8 Debugging
 
 We also provide a mechanism to receive tensor data related to FP8 layers during training which may include activations, weights and gradients.