Skip to content

Deepspeed ZERO MiCS support #20378

Open
Open
@hehepig4

Description

@hehepig4

Description & Motivation

After deepspeed 0.9.2, they provided Misc support, which can specify how to split parameters across devices reference. It's a pleasure if lightning deepspeed strategy has such features.

Pitch

By detecting if the keywords 'mics_shard_size' or 'mics_hierarchical_params_gather' in deepspeed args, deepspeed can decide whether to use mics.
This requires using 'deepspeed.zero.MiCS_Init()' instead 'deepspeed.zero.Init()' when initializing the model.

Alternatives

No response

Additional context

I simply override DeepSpeedStrategy.model_sharded_context (line 519 in lightning.pytorch.strategies.deepspeed.py) with
image, it works. So, can we add some conditions here to use an alternative context manager?

cc @Borda @awaelchli

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions