Train process is blocked when kineto is processing traceEvents 

When using on-demand profiling via `dynolog` and `kineto`, we noticed that, when profiling request configured with iterations, the last profiling iteration took more time than other profiling iterations. The train process is blocked at `optimizer.step()`, which calls `step` in `kineto`, finally, in `performRunLoop`, `libkineto::api().client()->stop() ` took the most time. 

At the same time, the `processTraceInternal` is executed asynchronously in `performRunLoop`,   which will not block torch train process. 

I'm wondering whether there's a plan to fix this performance issue to make minimal overhead on pytorch training process when on-demand profiling is enabled.  it would be very nice if there's already a plan or a proposal. If not, I'd like to make a proposal later. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Train process is blocked when kineto is processing traceEvents #953

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Train process is blocked when kineto is processing traceEvents #953

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions