-
Notifications
You must be signed in to change notification settings - Fork 207
Open
Description
Enabling dynamic profiler plugins (Feature Proposal)
Authors: @yisitu, @zli669, @briancoutinho (NVIDIA)
TLDR
- We would like to contribute a feature to Kineto to enable the plugging in of a new profiler for NVIDIA GPUs for streaming performance metrics and low-overhead tracing over a long duration.
- We propose a Dynamic plugin capability that enables Kineto to load the new profiler module. The Dynamic interface is an extension of the statically compiled ActivityProfilerPlugin API. Dynamic loading makes Kineto more extensible and decouples plugin development from Kineto/PyTorch mainline.
Motivation
Why a dynamic plugin?
- The plugin feature allows the development and extension of new profiler capabilities before they are ready to be published. Otherwise, developing in PyTorch and Kineto will require a full compile time source integration and slow iterative development cycles.
- Extending the ActivityProfiler plugin interface to enable dynamic plugins has the added benefit of making kineto more extensible. Closed source plugins from other vendors or components can be easily integrated into kineto without complex compiler source hacks. It also permits independent development and prototyping of plugins.
Overview of Changes
We will be posting a PR shortly; the changes primarily include:
- Dynamic plugin C interface and implementation.
- Implement a fully C style interface for Activity Profiler plugins and add the shim necessary.
- Basic versioning and forward/backward compatibility support.
- Adding a runtime flag for CUPTI to be disabled so that it does not conflict with the new profiler plugin.
Prototype Results
We have a working prototype with kineto. You can download the trace attached below
- CUDA kernel events and correlation arrows work as usual.
- Also see the “GPU PM Counter” rows. Potentially any valid set of PM counters can be sampled.
Here is a sample trace json that can be viewed in Perfetto or Chrome Trace.
resnet_training.json.gz
Please let us know any questions or concerns.
cc @sraikund16 , @bertmaher , @valentinandrei , @nadavrot
yisitu and sraikund16
Metadata
Metadata
Assignees
Labels
No labels