Skip to content

Add profiling to Manager #137

Closed
@d4l3k

Description

@d4l3k

This is related to #116 but focused on Manager times

We don't have any profiling for manager operations and it would be great to add record_function annotations to Manager so we can track torchft overhead via the PyTorch profiler.

There's a few key points we want to track:

Relevant code in PT:

Testing:

To test we should add a new mocked test in manager_test.py where we enable the profiler and run through a step and make sure we have all the relevant areas logged.

Example manager mocked test: https://github.com/pytorch/torchft/blob/main/torchft/manager_test.py#L130-L164

You can also run with torchx with train_ddp.py example:

torchx run -- --replicas 2

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestmanagerprocess_grouprelated to ProcessGroups and collectives

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions