Skip to content

Add profiling to Manager #137

Open
Open
@d4l3k

Description

@d4l3k

This is related to #116 but focused on Manager times

We don't have any profiling for manager operations and it would be great to add record_function annotations to Manager so we can track torchft overhead via the PyTorch profiler.

There's a few key points we want to track:

Relevant code in PT:

Testing:

To test we should add a new mocked test in manager_test.py where we enable the profiler and run through a step and make sure we have all the relevant areas logged.

Example manager mocked test: https://github.com/pytorch/torchft/blob/main/torchft/manager_test.py#L130-L164

You can also run with torchx with train_ddp.py example:

torchx run -- --replicas 2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions