Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions bionemo-recipes/models/esm2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,8 +142,7 @@ You can also mix FP8 and FP4 layers by providing both recipes and a mixed `layer

When `use_quantized_model_init=True` is set in the config, layers are created inside a
`te.quantized_model_init` context. This tells TransformerEngine to initialize weights directly in
the target quantized format, avoiding a separate quantization step after initialization. This is
primarily useful when loading pre-quantized checkpoints.
the target quantized format, avoiding a separate quantization step after initialization.

```python
config = NVEsmConfig.from_pretrained(
Expand Down
3 changes: 1 addition & 2 deletions bionemo-recipes/models/mixtral/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,8 +125,7 @@ You can also mix FP8 and FP4 layers by providing both recipes and a mixed `layer

When `use_quantized_model_init=True` is set in the config, layers are created inside a
`te.quantized_model_init` context. This tells TransformerEngine to initialize weights directly in
the target quantized format, avoiding a separate quantization step after initialization. This is
primarily useful when loading pre-quantized checkpoints.
the target quantized format, avoiding a separate quantization step after initialization.

```python
config = NVMixtralConfig(
Expand Down
3 changes: 1 addition & 2 deletions bionemo-recipes/models/qwen/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,8 +158,7 @@ The same pattern applies to Qwen2.5 models using `NVQwen2Config` and `NVQwen2For

When `use_quantized_model_init=True` is set in the config, layers are created inside a
`te.quantized_model_init` context. This tells TransformerEngine to initialize weights directly in
the target quantized format, avoiding a separate quantization step after initialization. This is
primarily useful when loading pre-quantized checkpoints.
the target quantized format, avoiding a separate quantization step after initialization.

```python
config = NVQwen3Config.from_pretrained(
Expand Down
20 changes: 20 additions & 0 deletions bionemo-recipes/recipes/esm2_native_te/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,26 @@ When both `fp8_config` and `fp4_config` are enabled but only one layer list is p
claims the remaining layers. For example, if `fp8_layers=[1,2,3]` is set and `fp4_config.enabled=true` with no
`fp4_layers`, then layers 4 through N will default to FP4.

#### Quantized Model Initialization

When training with FP8 or FP4, you can initialize model weights directly in the target quantized format by setting
`config_kwargs.use_quantized_model_init=true`. This tells TransformerEngine to create weights inside a
`te.quantized_model_init` context, avoiding a separate quantization step after initialization.

```bash
python train_fsdp2.py --config-name L0_sanity \
fp8_config.enabled=true \
+config_kwargs.use_quantized_model_init=true
```

This also works with NVFP4:

```bash
python train_fsdp2.py --config-name L0_sanity \
fp4_config.enabled=true \
+config_kwargs.use_quantized_model_init=true
```

#### Quantization Stats Debugging

We provide a mechanism to log tensor statistics (activations, weights, gradients) for quantized layers during training.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ cp_size: 1

use_sequence_packing: false
dataset:
tokenizer_name: ???
tokenizer_name: ${config_name_or_path}
micro_batch_size: ???
num_workers: 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you able to add use_quantized_weights: null?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will then get passed to the non-TE models, but we could. We'd just have to show that training with the HF models requires deleting this key

max_seq_length: 1024
Expand Down
12 changes: 12 additions & 0 deletions bionemo-recipes/recipes/llama3_native_te/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,18 @@ configuration parameters, including switching to `MXFP8BlockScaling`, can be set
python train_fsdp2.py --config-name L0_sanity fp8_config.enabled=true
```

#### Quantized Model Initialization

When training with FP8, you can initialize model weights directly in the target quantized format by setting
`config_kwargs.use_quantized_model_init=true`. This tells TransformerEngine to create weights inside a
`te.quantized_model_init` context, avoiding a separate quantization step after initialization.

```bash
python train_fsdp2.py --config-name L0_sanity \
fp8_config.enabled=true \
+config_kwargs.use_quantized_model_init=true
```

#### FP8 Debugging

We also provide a mechanism to receive tensor data related to FP8 layers during training which may include activations, weights and gradients.
Expand Down
Loading