Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable init for daemon cases, remove event profiler code #1035

Closed
wants to merge 1 commit into from

Conversation

briancoutinho
Copy link
Contributor

@briancoutinho briancoutinho commented Feb 7, 2025

Summary:
Fix #1032
Cleans up the initialization for kineto

  • The current method to init kineto for CUDA builds is to add a callback on CUDA context. But this leads to CUPTI being enabled right from the start.
  • For the case where profiling daemon is enabled (dynolog), this PR initialized the profiler and config loader always, for both CPU and CUDA builds. This should be safe to do as kineto_init now happens when torch is imported.
  • Still leaving the CUPTI callback approach above for non dynolog/daemon use cases, this leave behavior inside Meta unchanged.
  • Let's start cleaning up Even profiler. From here it is offiically turned off. Will start nuking the files soon.

Differential Revision: D69285243

Test

Clone change in pytorch repo and build it.

Run dynolog in one shell

  ../dynolog/usr/local/bin/dynolog --enable_ipc_monitor

Run test python program

export KINETO_USE_DAEMON=1
python3 third_party/kineto/libkineto/third_party/dynolog/scripts/pytorch/linear_model_example.py

Without this change we daemon loader registry but no connection with dynolog

INFO:2025-02-07 00:42:44 2893200:2893200 init.cpp:136] Registering daemon config loader, cpuOnly =  0
cuda:0
99 170.32142639160156
10099 8.817169189453125

With the change we see dynolog getting connected to

INFO:2025-02-07 00:48:52 2955842:2955842 init.cpp:140] Registering daemon config loader, cpuOnly =  0
INFO:2025-02-07 00:48:52 2955842:2955842 CuptiActivityProfiler.cpp:264] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000
INFO:2025-02-07 00:48:52 2955842:2955842 DaemonConfigLoader.cpp:66] Setting communication fabric enabled = 1
INFO:2025-02-07 00:48:52 2955842:2955842 IpcFabricConfigClient.cpp:88] Setting up IPC Fabric at endpoint: dynoconfigclientfa3396c8-b168-4802-bd20-264cfbe514d2 status = initialized
INFO:2025-02-07 00:48:52 2955842:2955842 DaemonConfigLoader.cpp:66] Setting communication fabric enabled = 1
INFO:2025-02-07 00:48:52 2955842:2955842 DaemonConfigLoader.cpp:66] Setting communication fabric enabled = 1

We also find prints in dynolog shell

I20250207 00:48:57.009657 2955379 LibkinetoConfigManager.cpp:171] Registered process (2955842, 3411461, 2540747) for job 0.
I20250207 00:48:57.009698 2955379 IPCMonitor.cpp:84] getLibkinetoOnDemandRequest() : job id 0 pids = 2955842

Last, we can now trigger and collect trace, use a third shell

$> ../dynolog/usr/local/bin/dyno gputrace --log-file /tmp/test_trace.json
...
response length = 147
response = {"activityProfilersBusy":0,"activityProfilersTriggered":[2955842],"eventProfilersBusy":0,"eventProfilersTriggered":[],"processesMatched":[2955842]}
Matched 1 processes

Summary:
Cleans up the initialization for kineto
* The current method to init kineto for CUDA builds is to add a callback on CUDA context. But this leads to CUPTI being enabled right from the start.
* For the case where profiling daemon is enabled (dynolog), this PR initialized the profiler and config loader always, for both CPU and CUDA builds. This should be safe to do as kineto_init now happens when torch is imported.
* Still leaving the CUPTI callback approach above for non dynolog/daemon use cases, this leave behavior inside Meta unchanged.
* Let's start cleaning up Even profiler. From here it is offiically turned off. Will start nuking the files soon.

Differential Revision: D69285243
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D69285243

Copy link
Contributor

@sanrise sanrise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving on OSS to unblock. added a minor question on the internal diff D69285243.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 3c3fa42.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide a way to initialize connection to dynolog without triggering cupti API
3 participants