Skip to content

add option to skip initial sync in Manager #117

@d4l3k

Description

@d4l3k

We currently always heal on step 0 to avoid synchronization issues. We want an option to support skipping this sync for users who set the PyTorch seed so all ranks are initialized with the same values.

This should match the name init_sync from pytorch/pytorch#142824

Bonus would be to randomly initialize a value in Manager so we can detect whether or not ranks are seeded and throw an error if there's a mismatch on first quorum.

Relevant code:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions