Open
Description
Description & Motivation
After deepspeed 0.9.2, they provided Misc support, which can specify how to split parameters across devices reference. It's a pleasure if lightning deepspeed strategy has such features.
Pitch
By detecting if the keywords 'mics_shard_size' or 'mics_hierarchical_params_gather' in deepspeed args, deepspeed can decide whether to use mics.
This requires using 'deepspeed.zero.MiCS_Init()' instead 'deepspeed.zero.Init()' when initializing the model.
Alternatives
No response
Additional context
I simply override DeepSpeedStrategy.model_sharded_context (line 519 in lightning.pytorch.strategies.deepspeed.py) with
, it works. So, can we add some conditions here to use an alternative context manager?
cc @Borda @awaelchli