Skip to content

add profiling to ProcessGroupBaby #116

Open
@d4l3k

Description

@d4l3k

Currently ProcessGroupBaby doesn't support any profiling as the record_function pieces will run in the subprocess.

We either need to figure out some way to forward those profiling information from the child process -- or we can just add RecordFunction support to _run_func and _BabyWork.

We probably want to be able to track the async completion (i.e. via get_future) but that may be pretty expensive and also isn't super accurate since completion via get_future immediately completes for NCCL

It may be better to track blocking time (i.e. _run_func/wait) as well

Relevant code:

Metadata

Metadata

Assignees

Labels

process_grouprelated to ProcessGroups and collectivespython

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions