--hf_deepspeed_save flag to use Hugging Face Deepspeed logic and no configure_optimizers if optimizer/scheduler defined

### Description & Motivation

I have been a long time lightning user, but the Deepspeed integration has made it unusable, and as deepspeed is used for all model training this is a big problem.

I propose just porting over the HF Trainer deepspeed saving logic: https://github.com/huggingface/transformers/blob/main/src/transformers/trainer.py#L2352 for checkpoints as I have found the lightning logic doesn't work. 

The HF Trainer logic is more battle tested and works for each stage with various model sizes. When I use lightning quite often it doesn't work making the whole training run useless. HF Trainer also always saves a pytorch_model.bin with the checkpoint and then global_step folder with the deepspeed optimizer states. This makes a lot more sense—so you do'nt have to faff about converting optimizer states if you want to use the pytorch model which is often 10% of the size anyway, so negligible to save each time. 

I would also like to be able to define the optimizer and scheduler in the DS config without breaking the lightning logic, there should be a default to invalidate configure_optimizers if these are defined in the config. Most people training models like to use their own DS config with optimizer and scheduler defined and don't want to have to faff about with configure_optimizers when it can be handled by deepspeed



### Pitch

_No response_

### Alternatives

_No response_

### Additional context

_No response_

cc @borda @awaelchli

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

--hf_deepspeed_save flag to use Hugging Face Deepspeed logic and no configure_optimizers if optimizer/scheduler defined #17673

Description & Motivation

Pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

--hf_deepspeed_save flag to use Hugging Face Deepspeed logic and no configure_optimizers if optimizer/scheduler defined #17673

Description

Description & Motivation

Pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions