Skip to content

Integrate FullyParallel{Save|Load}Wrapper as an optional wrapper in NeMo layer #48

@g-husam

Description

@g-husam

What

Add a parameter to wrapper_util.py like use_fully_parallel_wrapper that is False by default, and when True, wraps the save_strategy and load_strategy with FullyParallelSaveWrapper and FullyParallelLoadWrapper respectively (from megatron.core.dist_checkpointing.strategies.fully_parallel).

Add thorough tests accordingly.

Why

Gives users the flexibility to have even data distribution across ranks, without extra reliance on rank 0, similar to the recommended settings in NeMo 2.0. Without this, overall training time may still be the same/better at smaller cluster sizes, but may be worse at larger sizes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions