Skip to content

Commit

Permalink
Fix Kineto Stress Test (#971)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: #971

The Kineto Stress test was failing for two reasons:

1) In the test itself, we did not set the converter for the TSC timestamp, making all values out of range

2) In Kineto itself there was a bug regarding child profilers. When these profilers are set their spans are set to all 0. However, when we set the record start in Kineto we use the start span of the CPU trace to get the start of the trace itself. In the future we should probably decouple these things but for now I added an edge case to ensure that the profile is not malformed.

Reviewed By: aaronenyeshi

Differential Revision: D60415474

fbshipit-source-id: e7b268aa0d41532448a28e1c1b4fe2ba81fda8da
  • Loading branch information
sraikund16 authored and facebook-github-bot committed Jul 31, 2024
1 parent 188c5f5 commit 18606c9
Show file tree
Hide file tree
Showing 3 changed files with 14 additions and 2 deletions.
4 changes: 4 additions & 0 deletions libkineto/src/CuptiActivityProfiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1298,6 +1298,10 @@ void CuptiActivityProfiler::finalizeTrace(const Config& config, ActivityLogger&
for (auto& session : sessions_){
auto trace_buffer = session->getTraceBuffer();
if (trace_buffer) {
// Set child start time to profiling start time if not set
if (trace_buffer->span.startTime == 0) {
trace_buffer->span.startTime = captureWindowStartTime_;
}
traceBuffers_->cpu.push_back(std::move(trace_buffer));
}
}
Expand Down
6 changes: 4 additions & 2 deletions libkineto/src/output_json.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -233,8 +233,10 @@ void ChromeTraceLogger::handleTraceSpan(const TraceSpan& span) {
}

uint64_t start = transToRelativeTime(span.startTime);
uint64_t dur = span.endTime - span.startTime;


// If endTime is 0 and start time is non-zero, dur can overflow. Add
// a guard to prevent this.
uint64_t dur = (span.endTime == 0) ? 0 : span.endTime - span.startTime;
// clang-format off
traceOf_ << fmt::format(R"JSON(
{{
Expand Down
6 changes: 6 additions & 0 deletions libkineto/stress_test/kineto_stress_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@
#include "kineto/libkineto/stress_test/tensor_cache.cuh"
#include "kineto/libkineto/stress_test/utils.h"
#include "kineto/libkineto/fb/nccl_profiler/NcclProfiler.h"
#include <c10/util/ApproximateClock.h>
#include <ApproximateClock.h>

using namespace kineto_stress_test;

Expand Down Expand Up @@ -89,6 +91,8 @@ void read_inputs_from_json(std::string sJsonFile, stress_test_args *test_args,

void trace_collection_thread(uint32_t trace_length_us,
uint32_t cupti_buffer_mb) {
c10::ApproximateClockToUnixTimeConverter clockConverter;


if (cupti_buffer_mb > 0) {
// Configure CUPTI buffer sizes
Expand All @@ -111,6 +115,8 @@ void trace_collection_thread(uint32_t trace_length_us,
auto& profiler = libkineto::api().activityProfiler();
libkineto::api().initProfilerIfRegistered();
profiler.prepareTrace(types);
auto converter = clockConverter.makeConverter();
libkineto::get_time_converter() = converter;

// Collect the trace
profiler.startTrace();
Expand Down

0 comments on commit 18606c9

Please sign in to comment.