-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
Description & Motivation
Related: #12820
Actually, the description is pretty much the same of that issue:
.. a few more parameters have become available for DeepSpeed including ignore_unused_parameters (the opposite of find_unused_parameters for DDP).
https://www.deepspeed.ai/docs/config-json/#zero-optimizations-for-fp16-training
Integrate ignore_unused_parameters, round_robin_gradients and stage3_gather_16bit_weights_on_model_save into the Strategy.
Pitch
An ultimate solution might be to allow .json-based configuration as done in Deepspeed, but adding more parameters would be a great remedy at least in the short term. Personally I'm interested in ignore_unused_parameters, which might become more and more useful as people train multimodal LLMs (related: huggingface/accelerate#2194)
Alternatives
Implementing this on my Lightning fork and use it.
Additional context
No response
cc @Borda @awaelchli