Skip to content

Commit

Permalink
Avoid marking every profile loop stop as Collection stage, use data a…
Browse files Browse the repository at this point in the history
…vailable to mark errored stages. (#977)

Summary:
Pull Request resolved: #977

We already have the data collected to know if the collection was stopped due to `collectionDone` or `stopCollection`, the later is only set when CUPTI abruptly stops in events like not finding buffers.

We infact also set this in the itnernal Error Counters, so leverage that functionality within UST logging as well to denote a terminal stage within Kineto.

Reviewed By: aaronenyeshi

Differential Revision: D61226939

fbshipit-source-id: a4d5fa525d4457d44f0b959e4761b82de160152c
  • Loading branch information
sanrise authored and facebook-github-bot committed Aug 16, 2024
1 parent d975313 commit 7d5e58f
Showing 1 changed file with 7 additions and 2 deletions.
9 changes: 7 additions & 2 deletions libkineto/src/CuptiActivityProfiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1001,7 +1001,7 @@ void CuptiActivityProfiler::configure(
// presumably because structures are allocated and initialized, callbacks
// are activated etc. After a while the overhead decreases and stabilizes.
// It's therefore useful to perform some warmup before starting recording.
LOG(INFO) << "Enabling GPU tracing";
LOG(INFO) << "Enabling GPU tracing with max CUPTI buffer size " << config_->activitiesMaxGpuBufferSize() / 1024 / 1024 << "MB)";
cupti_.setMaxBufferSize(config_->activitiesMaxGpuBufferSize());
time_point<system_clock> timestamp;
if (VLOG_IS_ON(1)) {
Expand Down Expand Up @@ -1174,6 +1174,8 @@ const time_point<system_clock> CuptiActivityProfiler::performRunLoopStep(
std::lock_guard<std::mutex> guard(mutex_);
stopTraceInternal(now);
resetInternal();
LOG(ERROR) << "State: Warmup stopped by CUPTI. (Buffer size configured is " << config_->activitiesMaxGpuBufferSize() / 1024 / 1024 << "MB)";
UST_LOGGER_MARK_COMPLETED(kWarmUpStage);
VLOG(0) << "Warmup -> WaitForRequest";
break;
}
Expand Down Expand Up @@ -1222,7 +1224,10 @@ const time_point<system_clock> CuptiActivityProfiler::performRunLoopStep(
}

#if defined(HAS_CUPTI) || defined(HAS_ROCTRACER)
ecs_.cupti_stopped_early = cupti_.stopCollection;
if (cupti_.stopCollection) {
ecs_.cupti_stopped_early = cupti_.stopCollection;
LOG(ERROR) << "State: CollectTrace stopped by CUPTI. (Buffer size configured is " << config_->activitiesMaxGpuBufferSize() / 1024 / 1024 << "MB)";
}
#endif // HAS_CUPTI || HAS_ROCTRACER

std::lock_guard<std::mutex> guard(mutex_);
Expand Down

0 comments on commit 7d5e58f

Please sign in to comment.