You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: primus/README_patch.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -47,7 +47,7 @@ These arguments are introduced in the Megatron module logic (e.g., training loop
47
47
|`disable_last_saving`|`false`| v0.1.0 | Skip saving the final checkpoint at the last iteration. | NA | Useful for profiling or benchmarking runs. |
48
48
|`no_fp8_weight_transpose_cache`|`false`| v0.2.0 | Disable the FP8 weight transpose cache to save memory. |`megatron.core.extensions.transformer_engine.TELinear`, `megatron.core.extensions.transformer_engine.TELayerNormColumnParallelLinear`, `megatron.core.extensions.transformer_engine.TEDelayedScaling`| May affect performance but reduce memory use. |
49
49
|`decoder_pipeline_manual_split_list`|`null`| v0.2.0 | Enable manual pipeline split in (interleaved) 1F1B pipeline parallelism. |`megatron.core.transformer.transformer_block.get_num_layers_to_build`, `megatron.core.transformer.transformer_layer.get_transformer_layer_offset`| May be deprecated when megatron gets updated. |
50
-
|`attn_warmup`|`false`| v0.2.0 | Add attention fwd/bwd warmup to save iter1's time when pp is used. | NA | Can save much time for pipeline debug. |
50
+
|`pp_warmup`|`false`| v0.2.0 | Add attention/mlp fwd/bwd warmup to save iter1's time when pp degree is large. | NA | Can save much time for pipeline debug. |
51
51
|`dump_pp_data`|`false`| v0.2.0 | Enable dumping pp schedule data for visualization. |`megatron.core.pipeline_parallel.schedules.forward_step`, `megatron.core.pipeline_parallel.schedules.backward_step`, `megatron.core.pipeline_parallel.schedules.forward_backward_pipelining_with_interleaving`, `megatron.core.pipeline_parallel.schedules.forward_backward_pipelining_without_interleaving`| Useful for pipeline schedule visualization. |
52
52
|`disable_profiler_activity_cpu`|`false`| v0.2.0 | Disable CPU activityt in torch profiling, . | NA | If you only want to trace CUDA kernels and get a smaller trace JSON file, you can enable this option. However, if you plan to run with TraceLen, please do not enable it. |
53
53
|`use_rocm_mem_info`|`false`| v0.2.0 | Logging ROCm memory information in Megatron-LM Trainer | NA | If `use_rocm_mem_info = True`, ROCm memory information will be collected with `rocm-smi` at every iteration. |
0 commit comments