Skip to content

Commit

Permalink
Auto-init when CUDA_INJECTION64_PATH=none is set (#979)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: #979

CUDA_INJECTION64_PATH being set causes Kineto to skip auto-init. However CUDA_INJECTION64_PATH=none is a commonly used config option, and we should not skip auto-init since the path is set to none.

This diff fixes the scenario where `CUDA_INJECTION64_PATH=none` is set and on-demand trace should be enabled.

Test Plan:
CI

Ran locally with `CUDA_INJECTION64_PATH=none` and on-demand profiling works!
```
$ CUDA_INJECTION64_PATH=none buck run mode/dev-nosan kineto/libkineto/fb/integration_tests:pytorch_resnet_integration_test
Buck UI: https://www.internalfb.com/buck2/282115c8-6c29-414a-9d39-c4afb73d4e52
Network: Up: 0B  Down: 0B
Jobs completed: 19751. Time elapsed: 1.5s.
BUILD SUCCEEDED
[INFO: args.py:  113]: Setting master address to localhost and port to 62421
[INFO: pytorch_resnet_integration_test.py:  256]: Start: ready to do work 2114906
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0819 09:07:08.277374 2116870 C10dScubaLogger.cpp:32] Registering C10dScubaLogger...
I0819 09:07:08.584887 2116870 ProcessGroupNCCL.cpp:865] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL initialization options: size: 1, global rank: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, SPLIT_FROM: 0, SPLIT_COLOR: 0, PG Name: 0
I0819 09:07:08.584949 2116870 ProcessGroupNCCL.cpp:874] [PG ID 0 PG GUID 0 Rank 0] ProcessGroupNCCL environments: NCCL version: 2.18.3, TORCH_NCCL_ASYNC_ERROR_HANDLING: 3, TORCH_NCCL_DUMP_ON_TIMEOUT: 0, TORCH_NCCL_WAIT_TIMEOUT_DUMP_MILSEC: 60000, TORCH_NCCL_DESYNC_DEBUG: 0, TORCH_NCCL_ENABLE_TIMING: 0, TORCH_NCCL_BLOCKING_WAIT: 0, TORCH_DISTRIBUTED_DEBUG: OFF, TORCH_NCCL_ENABLE_MONITORING: 0, TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC: 480, TORCH_NCCL_TRACE_BUFFER_SIZE: 2000, TORCH_NCCL_COORD_CHECK_MILSEC: 1000, TORCH_NCCL_NAN_CHECK: 0, TORCH_NCCL_LOG_CPP_STACK_ON_UNCLEAN_SHUTDOWN: 1
INFO:2024-08-19 09:07:11 2116870:2116870 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1
INFO:2024-08-19 09:07:11 2116870:2116870 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000
INFO:2024-08-19 09:07:11 2116870:2119791 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0
INFO:2024-08-19 09:07:12 2116870:2119791 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0
I0819 09:07:15.780548 2116870 ProcessGroupNCCL.cpp:2130] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL broadcast unique ID through store took 0.059698 ms
NCCL version 2.18.3+cudaCUDA_MAJOR.CUDA_MINOR
I0819 09:07:15.851341 2116870 ProcessGroupNCCL.cpp:2240] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL created ncclComm_ 0x7f67ce8ec000 on CUDA device:
I0819 09:07:15.851400 2116870 ProcessGroupNCCL.cpp:2245] [PG ID 0 PG GUID 0(default_pg) Rank 0] NCCL_DEBUG: WARN
[INFO: pytorch_resnet_integration_test.py:  199]: step: 0, peak allocated GPU mem: 2.97GB, peak active GPU mem: 2.97GB, peak reserved GPU mem: 3.07GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 20, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 40, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 60, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 80, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 100, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 120, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 140, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 160, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 180, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 200, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 220, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
INFO:2024-08-19 09:07:36 2116870:2119791 ConfigLoader.cpp:271] Received config from dyno:

    ACTIVITIES_COMPRESSION_ALGORITHM=GZIP
    REQUEST_GROUP_TRACE_ID=E54247FF9C32A885E23EC0A21BC9DD96
      REQUEST_TRACE_ID=E54247FF9C32A885E23EC0A21BC9DD96
    ACTIVITIES_DURATION_MSECS=500
    PROFILE_REPORT_INPUT_SHAPES=true
    PROFILE_PROFILE_MEMORY=false
    PROFILE_WITH_STACK=false
    PROFILE_WITH_FLOPS=false
    PROFILE_WITH_MODULES=false
    ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/0/1724083651/localhost/
    REQUEST_TRACE_ID=5172367202475739244

[INFO: pytorch_resnet_integration_test.py:  199]: step: 240, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
INFO:2024-08-19 09:07:37 2116870:2121591 ActivityProfilerController.cpp:147] Received on-demand activity trace request by  profile timestamp = 1724083663089205388
INFO:2024-08-19 09:07:37 2116870:2121591 output_json.cpp:119] Tracing to temporary file /tmp/libkineto_activities_2116870.json
  Log file: /tmp/libkineto_activities_2116870.json
  Trace start time: 2024-08-19 09:07:43  Trace duration: 500ms
  Warmup duration: 5s
  Max GPU buffer size: 128MB
  Enabled activities: cpu_op,user_annotation,gpu_user_annotation,gpu_memcpy,gpu_memset,kernel,external_correlation,cuda_runtime,cuda_driver,cpu_instant_event,python_function,overhead,mtia_runtime,mtia_ccp_events,cuda_sync
  Manifold bucket: gpu_traces
  Manifold object: tree/traces/dynocli/0/1724083651/localhost/libkineto_activities_2116870.json
  Trace compression enabled: 1
  TTL in seconds: 31536000 (365 days)
INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:1004] Enabling GPU tracing with max CUPTI buffer size 128MB)
INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:929] [Profiler = NcclProfiler] Evaluating whether to run child profiler.
INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:942] [Profiler = NcclProfiler] Not running child profiler.
INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:929] [Profiler = CuptiRangeProfiler] Evaluating whether to run child profiler.
INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:942] [Profiler = CuptiRangeProfiler] Not running child profiler.
INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:1060] Tracing starting in 5s
INFO:2024-08-19 09:07:37 2116870:2121591 CuptiActivityProfiler.cpp:1065] Tracing will end in 6s
[INFO: pytorch_resnet_integration_test.py:  199]: step: 260, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 280, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 300, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 320, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
STAGE:2024-08-19 09:07:43 2116870:2121591 CuptiActivityProfiler.cpp:1185] Completed Stage: Warm Up
INFO:2024-08-19 09:07:43 2116870:2121591 CuptiActivityProfiler.cpp:1194] Tracing started
INFO:2024-08-19 09:07:43 2116870:2121591 CuptiActivityProfiler.cpp:1219] Tracing complete.
STAGE:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:1236] Completed Stage: Collection
[INFO: pytorch_resnet_integration_test.py:  199]: step: 340, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
INFO:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:274] Processing 1 CPU buffers
INFO:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:307] Processed 690712 GPU records (34396720 bytes)
INFO:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:349] Record counts: Out-of-range = 172266, Blocklisted runtime = 388516, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0
INFO:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:1269] Traces Recorded:
INFO:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:1272] PyTorch Profiler: 1 iterations
INFO:2024-08-19 09:07:44 2116870:2121591 output_json.cpp:563] Chrome Trace written to /tmp/libkineto_activities_2116870.json
INFO:2024-08-19 09:07:44 2116870:2121591 output_json.cpp:619] Renamed the trace file to /tmp/libkineto_activities_2116870.json
STAGE:2024-08-19 09:07:44 2116870:2121591 CuptiActivityProfiler.cpp:1259] Completed Stage: Post Processing
I0819 09:07:45.141360 2122707 RoutingDecider.cpp:227] Ping Request to proxy failed with: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
[INFO: pytorch_resnet_integration_test.py:  199]: step: 360, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 380, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
INFO:2024-08-19 09:07:48 2116870:2122676 ManifoldChromeTraceLogger.cpp:140] Uploaded the trace file to Manifold: gpu_traces/tree/traces/dynocli/0/1724083651/localhost/libkineto_activities_2116870.json.gz
INFO:2024-08-19 09:07:48 2116870:2122676 ManifoldChromeTraceLogger.cpp:142] Check the trace by opening the below URI in your Chrome web browser:
INFO:2024-08-19 09:07:48 2116870:2122676 ManifoldChromeTraceLogger.cpp:144] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724083651/localhost/libkineto_activities_2116870.json.gz&bucket=gpu_traces
INFO:2024-08-19 09:07:48 2116870:2122676 ManifoldChromeTraceLogger.cpp:154] Trace upload time: 3486626 us
STAGE:2024-08-19 09:07:48 2116870:2122676 ManifoldChromeTraceLogger.cpp:215] Completed Stage: Manifold Push
[INFO: pytorch_resnet_integration_test.py:  199]: step: 400, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 420, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 440, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 460, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 480, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 500, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 520, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 540, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 560, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 580, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 600, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 620, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 640, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 660, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 680, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 700, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 720, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 740, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 760, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 780, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 800, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 820, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 840, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 860, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 880, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 900, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 920, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 940, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 960, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  199]: step: 980, peak allocated GPU mem: 3.17GB, peak active GPU mem: 3.17GB, peak reserved GPU mem: 3.39GB.
[INFO: pytorch_resnet_integration_test.py:  216]: running function took 83.87466025352478 seconds to complete
I0819 09:08:26.611253 2116870 ProcessGroupNCCL.cpp:1164] [PG ID 0 PG GUID 0(default_pg) Rank 0] Launching ProcessGroupNCCL abort asynchrounously.
I0819 09:08:26.612347 2125072 ProcessGroupNCCL.cpp:1111] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL destroying ncclComm_ 0x7f67ce8ec000 on CUDA device: 0
I0819 09:08:26.612396 2125072 NCCLUtils.hpp:407] Aborting ncclComm_ 0x7f67ce8ec000 with reason: No abort reason provided.
I0819 09:08:27.125118 2125072 ProcessGroupNCCL.cpp:1130] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL destroyed  communicator on CUDA device: 0 with stream: 3
I0819 09:08:27.125389 2116870 ProcessGroupNCCL.cpp:1065] [PG ID 0 PG GUID 0(default_pg) Rank 0] future is successfully executed for: ProcessGroup abort
I0819 09:08:27.125425 2116870 ProcessGroupNCCL.cpp:1170] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL aborts successfully.
I0819 09:08:27.127446 2116870 ProcessGroupNCCL.cpp:1179] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL destructor entered.
I0819 09:08:27.127477 2116870 ProcessGroupNCCL.cpp:1200] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL watchdog thread joined.
I0819 09:08:27.127542 2116870 ProcessGroupNCCL.cpp:1204] [PG ID 0 PG GUID 0(default_pg) Rank 0] ProcessGroupNCCL heart beat monitor thread joined.
/usr/local/fbcode/platform010/lib/python3.10/tempfile.py:869: ResourceWarning:

Implicitly cleaning up <TemporaryDirectory '/tmp/tmpzjstec6w'>

[INFO: pytorch_resnet_integration_test.py:  267]: Finish: Completed test workload
```

Also, ran dyno/rust/kineto_integration_test:
```
$ buck2 run //dyno/rust/kineto_integration_test:dyno_gputrace_test -- --skip-trace-validation -j dacluster/dauser/daname/2 --collect-iters
File changed: fbsource//xplat/kineto/libkineto/fb/init.cpp
File changed: fbcode//kineto/libkineto/fb/init.cpp
Buck UI: https://www.internalfb.com/buck2/9bee8d11-3d2d-4295-bd12-a699a50ed8dd
Network: Up: 0B  Down: 2.9MiB  (reSessionID-8f330210-f582-4fed-bc14-cb4004759ccb)
Jobs completed: 76046. Time elapsed: 24.5s.
Cache hits: 100%. Commands: 3512 (cached: 3512, remote: 0, local: 0)
BUILD SUCCEEDED
----------------------------------------
Trace collector program = common-on-demand-tracing-library
----------------------------------------
Autodetected job type Tw for job id dacluster/dauser/daname/2
Building trace tester
Running /usr/local/bin/buck2 build mode/dev-nosan //kineto/libkineto/fb/integration_tests:trace_tester
Soft Error: source_directory_includes_subpackage: Directory `v2.17.1-1` of package `fbsource//third-party/nccl` may not cover any subpackages, but includes subpackage `v2.17.1-1/src/tests`.
Soft Error: source_directory_includes_subpackage: Directory `v2.18.3-1` of package `fbsource//third-party/nccl` may not cover any subpackages, but includes subpackage `v2.18.3-1/src/tests`.
Soft Error: source_directory_includes_subpackage: Directory `v2.19.3-1` of package `fbsource//third-party/nccl` may not cover any subpackages, but includes subpackage `v2.19.3-1/src/tests`.
Buck UI: https://www.internalfb.com/buck2/9bbcd130-82f9-4e51-982d-c9d88ed8686d
Network: Up: 17KiB  Down: 2.9MiB  (reSessionID-695fce22-aaa1-4cde-9364-bf07ee49f27e)
Jobs completed: 182380. Time elapsed: 1:10.3s.
Cache hits: 99%. Commands: 3516 (cached: 3514, remote: 1, local: 1)
BUILD SUCCEEDED
Running ../buck-out/v2/gen/fbcode/kineto/libkineto/fb/integration_tests/trace_tester --test_ondemand --libkineto_runner_iterations 300000 --iteration_based

Waiting 20 for the trace tester to start up...
----------------------------------------
Collecting gpu trace on application with
                pid = n/a,
                job_id = dacluster/dauser/daname/2,
                collect iters = true
----------------------------------------
I0819 12:39:03.595689 3167910 OnDemandTracingCommon.cpp:99] Found 1 matching PIDs (Busy: 0 activity)
I0819 12:39:03.595943 3167910 OnDemandTracingCommon.cpp:115] Triggered activity profiling for 1 process(es)
Trace Urls: [PidTraceUrlPair { pid: 3175601, trace_url: "https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/dacluster_dauser_daname_2/1724096328/localhost/libkineto_activities_3175601.json.gz&bucket=gpu_traces" }]

Trace tester is running in background, please be patient...
Trace tester application stdout:
 ---

 ---
Trace tester application stderr:
 ---
I0819 12:38:32.286183 3175601 trace_tester.cpp:213] Running in on-demand mode. Not running embedded profilers.
INFO:2024-08-19 12:38:32 3175601:3175601 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000
INFO:2024-08-19 12:38:32 3175601:3175844 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0
INFO:2024-08-19 12:38:33 3175601:3175844 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0
INFO:2024-08-19 12:38:34 3175601:3175601 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1
W0819 12:38:57.471963 3175601 interface.cpp:20] Warning: torch::jit::fuser::cuda::isEnabled() is deprecated (function operator())
INFO:2024-08-19 12:39:07 3175601:3175844 ConfigLoader.cpp:271] Received config from dyno:

    ACTIVITIES_COMPRESSION_ALGORITHM=GZIP
    REQUEST_GROUP_TRACE_ID=test_group_trace_id
    ACTIVITIES_DURATION_MSECS=500
    PROFILE_REPORT_INPUT_SHAPES=true
    PROFILE_PROFILE_MEMORY=false
    PROFILE_WITH_STACK=false
    PROFILE_WITH_FLOPS=false
    PROFILE_WITH_MODULES=false
    ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/dacluster_dauser_daname_2/1724096328/localhost/
    REQUEST_TRACE_ID=17832852696324168180

INFO:2024-08-19 12:39:08 3175601:3178719 ActivityProfilerController.cpp:147] Received on-demand activity trace request by  profile timestamp = 1724096354388906844
INFO:2024-08-19 12:39:08 3175601:3178719 output_json.cpp:119] Tracing to temporary file /tmp/libkineto_activities_3175601.json
  Log file: /tmp/libkineto_activities_3175601.json
  Trace start time: 2024-08-19 12:39:14  Trace duration: 500ms
  Warmup duration: 5s
  Max GPU buffer size: 128MB
  Enabled activities: cpu_op,user_annotation,gpu_user_annotation,gpu_memcpy,gpu_memset,kernel,external_correlation,cuda_runtime,cuda_driver,cpu_instant_event,python_function,overhead,mtia_runtime,mtia_ccp_events,cuda_sync
  Manifold bucket: gpu_traces
  Manifold object: tree/traces/dynocli/dacluster_dauser_daname_2/1724096328/localhost/libkineto_activities_3175601.json
  Trace compression enabled: 1
  TTL in seconds: 31536000 (365 days)
INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:1004] Enabling GPU tracing with max CUPTI buffer size 128MB)
INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:929] [Profiler = NcclProfiler] Evaluating whether to run child profiler.
INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:942] [Profiler = NcclProfiler] Not running child profiler.
INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:929] [Profiler = CuptiRangeProfiler] Evaluating whether to run child profiler.
INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:942] [Profiler = CuptiRangeProfiler] Not running child profiler.
INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:1060] Tracing starting in 5s
INFO:2024-08-19 12:39:08 3175601:3178719 CuptiActivityProfiler.cpp:1065] Tracing will end in 6s
STAGE:2024-08-19 12:39:14 3175601:3178719 CuptiActivityProfiler.cpp:1185] Completed Stage: Warm Up
INFO:2024-08-19 12:39:14 3175601:3178719 CuptiActivityProfiler.cpp:1194] Tracing started
INFO:2024-08-19 12:39:14 3175601:3178719 CuptiActivityProfiler.cpp:1219] Tracing complete.
STAGE:2024-08-19 12:39:15 3175601:3178719 CuptiActivityProfiler.cpp:1236] Completed Stage: Collection
INFO:2024-08-19 12:39:15 3175601:3178719 CuptiActivityProfiler.cpp:274] Processing 1 CPU buffers
INFO:2024-08-19 12:39:16 3175601:3178719 CuptiActivityProfiler.cpp:307] Processed 933012 GPU records (44806976 bytes)
INFO:2024-08-19 12:39:16 3175601:3178719 CuptiActivityProfiler.cpp:349] Record counts: Out-of-range = 104243, Blocklisted runtime = 514615, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0
INFO:2024-08-19 12:39:16 3175601:3178719 CuptiActivityProfiler.cpp:1269] Traces Recorded:
INFO:2024-08-19 12:39:16 3175601:3178719 CuptiActivityProfiler.cpp:1272] PyTorch Profiler: 1 iterations
INFO:2024-08-19 12:39:16 3175601:3178719 output_json.cpp:563] Chrome Trace written to /tmp/libkineto_activities_3175601.json
INFO:2024-08-19 12:39:16 3175601:3178719 output_json.cpp:619] Renamed the trace file to /tmp/libkineto_activities_3175601.json
STAGE:2024-08-19 12:39:16 3175601:3178719 CuptiActivityProfiler.cpp:1259] Completed Stage: Post Processing
I0819 12:39:16.931658 3179183 RoutingDecider.cpp:227] Ping Request to proxy failed with: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
W0819 12:39:20.854436 3179426 api.cpp:821] config context_lib/conf/contexts_map possesses a signature but this api instance has not been initialized with a ConfigSignatureVerifier object to verify it. Please initialize this api instance with an appropriate ConfigSignatureVerifier
E0819 12:39:20.919210 3179426 api.cpp:491] The specified logical config name() is not valid
W0819 12:39:20.919265 3179426 ConfigeratorOverride.cpp:98] Failed to read Config Overrides File ''
I0819 12:39:20.919397 3179426 EverstoreConfigHandler-inl.h:100] Loading config from configerator: 'everstore/common/fbtypes_to_clientid'
INFO:2024-08-19 12:39:22 3175601:3179132 ManifoldChromeTraceLogger.cpp:140] Uploaded the trace file to Manifold: gpu_traces/tree/traces/dynocli/dacluster_dauser_daname_2/1724096328/localhost/libkineto_activities_3175601.json.gz
INFO:2024-08-19 12:39:22 3175601:3179132 ManifoldChromeTraceLogger.cpp:142] Check the trace by opening the below URI in your Chrome web browser:
INFO:2024-08-19 12:39:22 3175601:3179132 ManifoldChromeTraceLogger.cpp:144] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/dacluster_dauser_daname_2/1724096328/localhost/libkineto_activities_3175601.json.gz&bucket=gpu_traces
INFO:2024-08-19 12:39:22 3175601:3179132 ManifoldChromeTraceLogger.cpp:154] Trace upload time: 6317931 us
STAGE:2024-08-19 12:39:22 3175601:3179132 ManifoldChromeTraceLogger.cpp:215] Completed Stage: Manifold Push
WARNING:2024-08-19 12:40:12 3175601:3175844 DynoConfigLoader.cpp:35] (x1) Failed to read config: No dyno config client

 ---
Autodetected job type Tw for job id dacluster/dauser/daname/2
Building trace tester
Running /usr/local/bin/buck2 build mode/dev-nosan //kineto/libkineto/fb/integration_tests:trace_tester
Buck UI: https://www.internalfb.com/buck2/bf79fa5f-e636-4d39-ab0c-cd237b3189d1
Network: Up: 0B  Down: 0B
Jobs completed: 5. Time elapsed: 0.1s.
BUILD SUCCEEDED
Running ../buck-out/v2/gen/fbcode/kineto/libkineto/fb/integration_tests/trace_tester --test_ondemand --libkineto_runner_iterations 300000 --iteration_based

Waiting 20 for the trace tester to start up...
----------------------------------------
Collecting gpu trace on application with
                pid = 3181985,
                job_id = dacluster/dauser/daname/2,
                collect iters = true
----------------------------------------
I0819 12:40:33.999593 3167910 OnDemandTracingCommon.cpp:99] Found 1 matching PIDs (Busy: 0 activity)
I0819 12:40:33.999762 3167910 OnDemandTracingCommon.cpp:115] Triggered activity profiling for 1 process(es)
Trace Urls: [PidTraceUrlPair { pid: 3181985, trace_url: "https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/dacluster_dauser_daname_2/1724096433/localhost/libkineto_activities_3181985.json.gz&bucket=gpu_traces" }]

Trace tester is running in background, please be patient...
Trace tester application stdout:
 ---

 ---
Trace tester application stderr:
 ---
I0819 12:40:16.687461 3181985 trace_tester.cpp:213] Running in on-demand mode. Not running embedded profilers.
INFO:2024-08-19 12:40:16 3181985:3181985 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000
INFO:2024-08-19 12:40:16 3181985:3182373 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0
INFO:2024-08-19 12:40:18 3181985:3182373 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0
INFO:2024-08-19 12:40:18 3181985:3181985 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1
INFO:2024-08-19 12:40:36 3181985:3182373 ConfigLoader.cpp:271] Received config from dyno:

    ACTIVITIES_COMPRESSION_ALGORITHM=GZIP
    REQUEST_GROUP_TRACE_ID=test_group_trace_id
    ACTIVITIES_DURATION_MSECS=500
    PROFILE_REPORT_INPUT_SHAPES=true
    PROFILE_PROFILE_MEMORY=false
    PROFILE_WITH_STACK=false
    PROFILE_WITH_FLOPS=false
    PROFILE_WITH_MODULES=false
    ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/dacluster_dauser_daname_2/1724096433/localhost/
    REQUEST_TRACE_ID=17968765238600337472

INFO:2024-08-19 12:40:37 3181985:3183119 ActivityProfilerController.cpp:147] Received on-demand activity trace request by  profile timestamp = 1724096443817688761
INFO:2024-08-19 12:40:37 3181985:3183119 output_json.cpp:119] Tracing to temporary file /tmp/libkineto_activities_3181985.json
  Log file: /tmp/libkineto_activities_3181985.json
  Trace start time: 2024-08-19 12:40:43  Trace duration: 500ms
  Warmup duration: 5s
  Max GPU buffer size: 128MB
  Enabled activities: cpu_op,user_annotation,gpu_user_annotation,gpu_memcpy,gpu_memset,kernel,external_correlation,cuda_runtime,cuda_driver,cpu_instant_event,python_function,overhead,mtia_runtime,mtia_ccp_events,cuda_sync
  Manifold bucket: gpu_traces
  Manifold object: tree/traces/dynocli/dacluster_dauser_daname_2/1724096433/localhost/libkineto_activities_3181985.json
  Trace compression enabled: 1
  TTL in seconds: 31536000 (365 days)
INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:1004] Enabling GPU tracing with max CUPTI buffer size 128MB)
INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:929] [Profiler = NcclProfiler] Evaluating whether to run child profiler.
INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:942] [Profiler = NcclProfiler] Not running child profiler.
INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:929] [Profiler = CuptiRangeProfiler] Evaluating whether to run child profiler.
INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:942] [Profiler = CuptiRangeProfiler] Not running child profiler.
INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:1060] Tracing starting in 5s
INFO:2024-08-19 12:40:37 3181985:3183119 CuptiActivityProfiler.cpp:1065] Tracing will end in 6s
W0819 12:40:41.175002 3181985 interface.cpp:20] Warning: torch::jit::fuser::cuda::isEnabled() is deprecated (function operator())
STAGE:2024-08-19 12:40:43 3181985:3183119 CuptiActivityProfiler.cpp:1185] Completed Stage: Warm Up
INFO:2024-08-19 12:40:43 3181985:3183119 CuptiActivityProfiler.cpp:1194] Tracing started
INFO:2024-08-19 12:40:44 3181985:3183119 CuptiActivityProfiler.cpp:1219] Tracing complete.
STAGE:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:1236] Completed Stage: Collection
INFO:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:274] Processing 1 CPU buffers
INFO:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:307] Processed 942895 GPU records (44635768 bytes)
INFO:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:349] Record counts: Out-of-range = 100281, Blocklisted runtime = 505241, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0
INFO:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:1269] Traces Recorded:
INFO:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:1272] PyTorch Profiler: 1 iterations
INFO:2024-08-19 12:40:45 3181985:3183119 output_json.cpp:563] Chrome Trace written to /tmp/libkineto_activities_3181985.json
INFO:2024-08-19 12:40:45 3181985:3183119 output_json.cpp:619] Renamed the trace file to /tmp/libkineto_activities_3181985.json
STAGE:2024-08-19 12:40:45 3181985:3183119 CuptiActivityProfiler.cpp:1259] Completed Stage: Post Processing
I0819 12:40:46.369321 3183880 RoutingDecider.cpp:227] Ping Request to proxy failed with: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
W0819 12:40:49.872421 3184196 api.cpp:821] config context_lib/conf/contexts_map possesses a signature but this api instance has not been initialized with a ConfigSignatureVerifier object to verify it. Please initialize this api instance with an appropriate ConfigSignatureVerifier
E0819 12:40:49.927807 3184196 api.cpp:491] The specified logical config name() is not valid
W0819 12:40:49.927846 3184196 ConfigeratorOverride.cpp:98] Failed to read Config Overrides File ''
I0819 12:40:49.927968 3184196 EverstoreConfigHandler-inl.h:100] Loading config from configerator: 'everstore/common/fbtypes_to_clientid'
INFO:2024-08-19 12:40:51 3181985:3183827 ManifoldChromeTraceLogger.cpp:140] Uploaded the trace file to Manifold: gpu_traces/tree/traces/dynocli/dacluster_dauser_daname_2/1724096433/localhost/libkineto_activities_3181985.json.gz
INFO:2024-08-19 12:40:51 3181985:3183827 ManifoldChromeTraceLogger.cpp:142] Check the trace by opening the below URI in your Chrome web browser:
INFO:2024-08-19 12:40:51 3181985:3183827 ManifoldChromeTraceLogger.cpp:144] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/dacluster_dauser_daname_2/1724096433/localhost/libkineto_activities_3181985.json.gz&bucket=gpu_traces
INFO:2024-08-19 12:40:51 3181985:3183827 ManifoldChromeTraceLogger.cpp:154] Trace upload time: 5519986 us
STAGE:2024-08-19 12:40:51 3181985:3183827 ManifoldChromeTraceLogger.cpp:215] Completed Stage: Manifold Push
WARNING:2024-08-19 12:41:51 3181985:3182373 DynoConfigLoader.cpp:35] (x1) Failed to read config: No dyno config client

 ---
Building trace tester
Running /usr/local/bin/buck2 build mode/dev-nosan //kineto/libkineto/fb/integration_tests:trace_tester
Buck UI: https://www.internalfb.com/buck2/40dd7a0b-6f22-4732-89ca-84d22a4e62f5
Network: Up: 0B  Down: 0B
Jobs completed: 5. Time elapsed: 0.2s.
BUILD SUCCEEDED
Running ../buck-out/v2/gen/fbcode/kineto/libkineto/fb/integration_tests/trace_tester --test_ondemand --libkineto_runner_iterations 300000 --iteration_based

Waiting 20 for the trace tester to start up...
----------------------------------------
Collecting gpu trace on application with
                pid = 3190209,
                job_id = n/a,
                collect iters = true
----------------------------------------
I0819 12:42:13.637505 3167910 OnDemandTracingCommon.cpp:99] Found 1 matching PIDs (Busy: 0 activity)
I0819 12:42:13.637663 3167910 OnDemandTracingCommon.cpp:115] Triggered activity profiling for 1 process(es)
Trace Urls: [PidTraceUrlPair { pid: 3190209, trace_url: "https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724096533/localhost/libkineto_activities_3190209.json.gz&bucket=gpu_traces" }]

Trace tester is running in background, please be patient...
Trace tester application stdout:
 ---

 ---
Trace tester application stderr:
 ---
I0819 12:41:56.295006 3190209 trace_tester.cpp:213] Running in on-demand mode. Not running embedded profilers.
INFO:2024-08-19 12:41:56 3190209:3190209 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000
INFO:2024-08-19 12:41:56 3190209:3190382 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0
INFO:2024-08-19 12:41:57 3190209:3190382 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0
INFO:2024-08-19 12:41:57 3190209:3190209 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1
INFO:2024-08-19 12:42:16 3190209:3190382 ConfigLoader.cpp:271] Received config from dyno:

    ACTIVITIES_COMPRESSION_ALGORITHM=GZIP
    REQUEST_GROUP_TRACE_ID=test_group_trace_id
    ACTIVITIES_DURATION_MSECS=500
    PROFILE_REPORT_INPUT_SHAPES=true
    PROFILE_PROFILE_MEMORY=false
    PROFILE_WITH_STACK=false
    PROFILE_WITH_FLOPS=false
    PROFILE_WITH_MODULES=false
    ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/0/1724096533/localhost/
    REQUEST_TRACE_ID=3582354066664069411

INFO:2024-08-19 12:42:17 3190209:3191748 ActivityProfilerController.cpp:147] Received on-demand activity trace request by  profile timestamp = 1724096543391277741
INFO:2024-08-19 12:42:17 3190209:3191748 output_json.cpp:119] Tracing to temporary file /tmp/libkineto_activities_3190209.json
  Log file: /tmp/libkineto_activities_3190209.json
  Trace start time: 2024-08-19 12:42:23  Trace duration: 500ms
  Warmup duration: 5s
  Max GPU buffer size: 128MB
  Enabled activities: cpu_op,user_annotation,gpu_user_annotation,gpu_memcpy,gpu_memset,kernel,external_correlation,cuda_runtime,cuda_driver,cpu_instant_event,python_function,overhead,mtia_runtime,mtia_ccp_events,cuda_sync
  Manifold bucket: gpu_traces
  Manifold object: tree/traces/dynocli/0/1724096533/localhost/libkineto_activities_3190209.json
  Trace compression enabled: 1
  TTL in seconds: 31536000 (365 days)
INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:1004] Enabling GPU tracing with max CUPTI buffer size 128MB)
INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:929] [Profiler = NcclProfiler] Evaluating whether to run child profiler.
INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:942] [Profiler = NcclProfiler] Not running child profiler.
INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:929] [Profiler = CuptiRangeProfiler] Evaluating whether to run child profiler.
INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:942] [Profiler = CuptiRangeProfiler] Not running child profiler.
INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:1060] Tracing starting in 5s
INFO:2024-08-19 12:42:17 3190209:3191748 CuptiActivityProfiler.cpp:1065] Tracing will end in 6s
W0819 12:42:19.165912 3190209 interface.cpp:20] Warning: torch::jit::fuser::cuda::isEnabled() is deprecated (function operator())
STAGE:2024-08-19 12:42:23 3190209:3191748 CuptiActivityProfiler.cpp:1185] Completed Stage: Warm Up
INFO:2024-08-19 12:42:23 3190209:3191748 CuptiActivityProfiler.cpp:1194] Tracing started
INFO:2024-08-19 12:42:23 3190209:3191748 CuptiActivityProfiler.cpp:1219] Tracing complete.
STAGE:2024-08-19 12:42:24 3190209:3191748 CuptiActivityProfiler.cpp:1236] Completed Stage: Collection
INFO:2024-08-19 12:42:24 3190209:3191748 CuptiActivityProfiler.cpp:274] Processing 1 CPU buffers
INFO:2024-08-19 12:42:25 3190209:3191748 CuptiActivityProfiler.cpp:307] Processed 981694 GPU records (47235736 bytes)
INFO:2024-08-19 12:42:25 3190209:3191748 CuptiActivityProfiler.cpp:349] Record counts: Out-of-range = 109007, Blocklisted runtime = 543555, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0
INFO:2024-08-19 12:42:25 3190209:3191748 CuptiActivityProfiler.cpp:1269] Traces Recorded:
INFO:2024-08-19 12:42:25 3190209:3191748 CuptiActivityProfiler.cpp:1272] PyTorch Profiler: 1 iterations
INFO:2024-08-19 12:42:25 3190209:3191748 output_json.cpp:563] Chrome Trace written to /tmp/libkineto_activities_3190209.json
INFO:2024-08-19 12:42:25 3190209:3191748 output_json.cpp:619] Renamed the trace file to /tmp/libkineto_activities_3190209.json
STAGE:2024-08-19 12:42:25 3190209:3191748 CuptiActivityProfiler.cpp:1259] Completed Stage: Post Processing
I0819 12:42:25.831236 3192298 RoutingDecider.cpp:227] Ping Request to proxy failed with: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
W0819 12:42:29.783102 3192507 api.cpp:821] config context_lib/conf/contexts_map possesses a signature but this api instance has not been initialized with a ConfigSignatureVerifier object to verify it. Please initialize this api instance with an appropriate ConfigSignatureVerifier
E0819 12:42:29.839179 3192507 api.cpp:491] The specified logical config name() is not valid
W0819 12:42:29.839231 3192507 ConfigeratorOverride.cpp:98] Failed to read Config Overrides File ''
I0819 12:42:29.839358 3192507 EverstoreConfigHandler-inl.h:100] Loading config from configerator: 'everstore/common/fbtypes_to_clientid'
INFO:2024-08-19 12:42:31 3190209:3192246 ManifoldChromeTraceLogger.cpp:140] Uploaded the trace file to Manifold: gpu_traces/tree/traces/dynocli/0/1724096533/localhost/libkineto_activities_3190209.json.gz
INFO:2024-08-19 12:42:31 3190209:3192246 ManifoldChromeTraceLogger.cpp:142] Check the trace by opening the below URI in your Chrome web browser:
INFO:2024-08-19 12:42:31 3190209:3192246 ManifoldChromeTraceLogger.cpp:144] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724096533/localhost/libkineto_activities_3190209.json.gz&bucket=gpu_traces
INFO:2024-08-19 12:42:31 3190209:3192246 ManifoldChromeTraceLogger.cpp:154] Trace upload time: 6102519 us
STAGE:2024-08-19 12:42:31 3190209:3192246 ManifoldChromeTraceLogger.cpp:215] Completed Stage: Manifold Push

 ---
Building trace tester
Running /usr/local/bin/buck2 build mode/dev-nosan //kineto/libkineto/fb/integration_tests:trace_tester
Buck UI: https://www.internalfb.com/buck2/b5cfc916-b247-432d-98a2-7b289b5552d7
Network: Up: 0B  Down: 0B
Jobs completed: 5. Time elapsed: 0.2s.
BUILD SUCCEEDED
Running ../buck-out/v2/gen/fbcode/kineto/libkineto/fb/integration_tests/trace_tester --test_ondemand --libkineto_runner_iterations 300000 --iteration_based

Waiting 20 for the trace tester to start up...
----------------------------------------
Collecting gpu trace on application with
                pid = n/a,
                job_id = n/a,
                collect iters = true
----------------------------------------
I0819 12:43:48.965962 3167910 OnDemandTracingCommon.cpp:99] Found 2 matching PIDs (Busy: 0 activity)
I0819 12:43:48.966220 3167910 OnDemandTracingCommon.cpp:115] Triggered activity profiling for 2 process(es)
Trace Urls: [PidTraceUrlPair { pid: 3196469, trace_url: "https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724096628/localhost/libkineto_activities_3196469.json.gz&bucket=gpu_traces" }, PidTraceUrlPair { pid: 3190209, trace_url: "https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724096628/localhost/libkineto_activities_3190209.json.gz&bucket=gpu_traces" }]

Trace tester is running in background, please be patient...
Trace tester application stdout:
 ---

 ---
Trace tester application stderr:
 ---
I0819 12:43:31.657953 3196469 trace_tester.cpp:213] Running in on-demand mode. Not running embedded profilers.
INFO:2024-08-19 12:43:31 3196469:3196469 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000
INFO:2024-08-19 12:43:31 3196469:3196920 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0
INFO:2024-08-19 12:43:32 3196469:3196920 DynoConfigLoader.cpp:61] Setting communication fabric enabled = 0
INFO:2024-08-19 12:43:33 3196469:3196469 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1
INFO:2024-08-19 12:43:51 3196469:3196920 ConfigLoader.cpp:271] Received config from dyno:

    ACTIVITIES_COMPRESSION_ALGORITHM=GZIP
    REQUEST_GROUP_TRACE_ID=test_group_trace_id
    ACTIVITIES_DURATION_MSECS=500
    PROFILE_REPORT_INPUT_SHAPES=true
    PROFILE_PROFILE_MEMORY=false
    PROFILE_WITH_STACK=false
    PROFILE_WITH_FLOPS=false
    PROFILE_WITH_MODULES=false
    ACTIVITIES_MANIFOLD_PATH=gpu_traces/tree/traces/dynocli/0/1724096628/localhost/
    REQUEST_TRACE_ID=6007547554162055103

INFO:2024-08-19 12:43:52 3196469:3198770 ActivityProfilerController.cpp:147] Received on-demand activity trace request by  profile timestamp = 1724096638759081016
INFO:2024-08-19 12:43:52 3196469:3198770 output_json.cpp:119] Tracing to temporary file /tmp/libkineto_activities_3196469.json
  Log file: /tmp/libkineto_activities_3196469.json
  Trace start time: 2024-08-19 12:43:58  Trace duration: 500ms
  Warmup duration: 5s
  Max GPU buffer size: 128MB
  Enabled activities: cpu_op,user_annotation,gpu_user_annotation,gpu_memcpy,gpu_memset,kernel,external_correlation,cuda_runtime,cuda_driver,cpu_instant_event,python_function,overhead,mtia_runtime,mtia_ccp_events,cuda_sync
  Manifold bucket: gpu_traces
  Manifold object: tree/traces/dynocli/0/1724096628/localhost/libkineto_activities_3196469.json
  Trace compression enabled: 1
  TTL in seconds: 31536000 (365 days)
INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:1004] Enabling GPU tracing with max CUPTI buffer size 128MB)
INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:929] [Profiler = NcclProfiler] Evaluating whether to run child profiler.
INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:942] [Profiler = NcclProfiler] Not running child profiler.
INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:929] [Profiler = CuptiRangeProfiler] Evaluating whether to run child profiler.
INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:942] [Profiler = CuptiRangeProfiler] Not running child profiler.
INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:1060] Tracing starting in 5s
INFO:2024-08-19 12:43:52 3196469:3198770 CuptiActivityProfiler.cpp:1065] Tracing will end in 6s
W0819 12:43:58.627271 3196469 interface.cpp:20] Warning: torch::jit::fuser::cuda::isEnabled() is deprecated (function operator())
STAGE:2024-08-19 12:43:58 3196469:3198770 CuptiActivityProfiler.cpp:1185] Completed Stage: Warm Up
INFO:2024-08-19 12:43:58 3196469:3198770 CuptiActivityProfiler.cpp:1194] Tracing started
INFO:2024-08-19 12:43:59 3196469:3198770 CuptiActivityProfiler.cpp:1219] Tracing complete.
STAGE:2024-08-19 12:43:59 3196469:3198770 CuptiActivityProfiler.cpp:1236] Completed Stage: Collection
INFO:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:274] Processing 1 CPU buffers
INFO:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:307] Processed 816296 GPU records (38657456 bytes)
INFO:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:349] Record counts: Out-of-range = 86644, Blocklisted runtime = 437744, Invalid ext correlations = 0, CPU GPU out-of-order = 0, Unexpected CUDA events = 0, CUPTI stopped early? = 0
INFO:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:1269] Traces Recorded:
INFO:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:1272] PyTorch Profiler: 1 iterations
INFO:2024-08-19 12:44:00 3196469:3198770 output_json.cpp:563] Chrome Trace written to /tmp/libkineto_activities_3196469.json
INFO:2024-08-19 12:44:00 3196469:3198770 output_json.cpp:619] Renamed the trace file to /tmp/libkineto_activities_3196469.json
STAGE:2024-08-19 12:44:00 3196469:3198770 CuptiActivityProfiler.cpp:1259] Completed Stage: Post Processing
I0819 12:44:01.344978 3199372 RoutingDecider.cpp:227] Ping Request to proxy failed with: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused)
W0819 12:44:04.267980 3199473 api.cpp:821] config context_lib/conf/contexts_map possesses a signature but this api instance has not been initialized with a ConfigSignatureVerifier object to verify it. Please initialize this api instance with an appropriate ConfigSignatureVerifier
E0819 12:44:04.327500 3199473 api.cpp:491] The specified logical config name() is not valid
W0819 12:44:04.327549 3199473 ConfigeratorOverride.cpp:98] Failed to read Config Overrides File ''
I0819 12:44:04.327661 3199473 EverstoreConfigHandler-inl.h:100] Loading config from configerator: 'everstore/common/fbtypes_to_clientid'
INFO:2024-08-19 12:44:05 3196469:3199296 ManifoldChromeTraceLogger.cpp:140] Uploaded the trace file to Manifold: gpu_traces/tree/traces/dynocli/0/1724096628/localhost/libkineto_activities_3196469.json.gz
INFO:2024-08-19 12:44:05 3196469:3199296 ManifoldChromeTraceLogger.cpp:142] Check the trace by opening the below URI in your Chrome web browser:
INFO:2024-08-19 12:44:05 3196469:3199296 ManifoldChromeTraceLogger.cpp:144] https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/0/1724096628/localhost/libkineto_activities_3196469.json.gz&bucket=gpu_traces
INFO:2024-08-19 12:44:05 3196469:3199296 ManifoldChromeTraceLogger.cpp:154] Trace upload time: 4973015 us
STAGE:2024-08-19 12:44:05 3196469:3199296 ManifoldChromeTraceLogger.cpp:215] Completed Stage: Manifold Push

 ---
```

Reviewed By: briancoutinho

Differential Revision: D61478101

Pulled By: aaronenyeshi

fbshipit-source-id: d122f1098301d3acddefed4aa08bc6863358a8cf
  • Loading branch information
aaronenyeshi authored and facebook-github-bot committed Aug 20, 2024
1 parent 7d5e58f commit 120cbc2
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion libkineto/src/init.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ void libkineto_init(bool cpuOnly, bool logOnError) {
}

// The cuda driver calls this function if the CUDA_INJECTION64_PATH environment
// variable is set
// variable is set. Should be skipped if unset or CUDA_INJECTION64_PATH=none.
int InitializeInjection(void) {
LOG(INFO) << "Injection mode: Initializing libkineto";
libkineto_init(false /*cpuOnly*/, true /*logOnError*/);
Expand Down

0 comments on commit 120cbc2

Please sign in to comment.