Expose FSDP2 MixedPrecisionPolicy params

It would be a good user experience improvement to expose FSDP2 [`MixedPrecisionPolicy`](https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_fully_shard/_fsdp_api.py#L9) to be set through the config, at least for `param_dtype` and `reduce_dtype`. These are important parameters when training in low precision (e.g. `bf16`) and right now are only changeable by hardcoding [`training.fully_shard`](https://github.com/pytorch/torchtune/blob/main/torchtune/training/_distributed.py#L530-L532). See #2254 for why these are important parameters.

As a suggestion, we may want to hardcode `reduce_dtype=torch.float32` by default. I don't think it reduces training speed at all (but we should check) and helps with convergence / stability. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expose FSDP2 MixedPrecisionPolicy params #2267

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Expose FSDP2 MixedPrecisionPolicy params #2267

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions