Skip to content

Add additional parameters to the DeepSpeedStrategy #19278

@keunwoochoi

Description

@keunwoochoi

Description & Motivation

Related: #12820

Actually, the description is pretty much the same of that issue:

.. a few more parameters have become available for DeepSpeed including ignore_unused_parameters (the opposite of find_unused_parameters for DDP).

https://www.deepspeed.ai/docs/config-json/#zero-optimizations-for-fp16-training

Integrate ignore_unused_parameters, round_robin_gradients and stage3_gather_16bit_weights_on_model_save into the Strategy.

Pitch

An ultimate solution might be to allow .json-based configuration as done in Deepspeed, but adding more parameters would be a great remedy at least in the short term. Personally I'm interested in ignore_unused_parameters, which might become more and more useful as people train multimodal LLMs (related: huggingface/accelerate#2194)

Alternatives

Implementing this on my Lightning fork and use it.

Additional context

No response

cc @Borda @awaelchli

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions