Open
Description
Currently ProcessGroupBaby doesn't support any profiling as the record_function
pieces will run in the subprocess.
We either need to figure out some way to forward those profiling information from the child process -- or we can just add RecordFunction support to _run_func
and _BabyWork
.
We probably want to be able to track the async completion (i.e. via get_future) but that may be pretty expensive and also isn't super accurate since completion via get_future immediately completes for NCCL
It may be better to track blocking time (i.e. _run_func/wait) as well
Relevant code: