Integrate FullyParallel{Save|Load}Wrapper as an optional wrapper in NeMo layer

## What

Add a parameter to `wrapper_util.py` like `use_fully_parallel_wrapper` that is `False` by default, and when True, wraps the `save_strategy` and `load_strategy` with `FullyParallelSaveWrapper` and `FullyParallelLoadWrapper` respectively (from `megatron.core.dist_checkpointing.strategies.fully_parallel`).

Add thorough tests accordingly.

## Why

Gives users the flexibility to have even data distribution across ranks, without extra reliance on rank 0, similar to the recommended settings in NeMo 2.0. Without this, overall training time may still be the same/better at smaller cluster sizes, but may be worse at larger sizes.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate FullyParallel{Save|Load}Wrapper as an optional wrapper in NeMo layer #48

What

Why

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Integrate FullyParallel{Save|Load}Wrapper as an optional wrapper in NeMo layer #48

Description

What

Why

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions