Skip to content

Abstract torch.distributed APIs in CheckpointLoader #30

@g-husam

Description

@g-husam

Consume abstract function getters for get_rank/get_local_rank, and a protocol for collective comms APIs (like broadcast_object), to make it easier to test and swap impls via dependency injection.

See DefaultMLFlashpointCheckpointSaver as an example, to do the same for other classes like DefaultMLFlashpointCheckpointLoader.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions