Skip to content

Commit 2ab76a8

Browse files
Include host activity of benched fn in CPU time when blocking kernel is used
Based on findings of #249, m_cpu_timer.start() is being called from kernel_launcher_timer.start() method. Previously it was called from kernel_launcher_timer.stop() just before unblock_stream() call with the intention to hone in time to execute GPU work, but this excluded any host work performed by the benched function from CPU time.
1 parent 0c24f02 commit 2ab76a8

File tree

1 file changed

+3
-5
lines changed

1 file changed

+3
-5
lines changed

nvbench/detail/measure_cold.cuh

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -153,18 +153,16 @@ struct measure_cold_base::kernel_launch_timer
153153
m_measure.gpu_frequency_start();
154154
}
155155
m_measure.m_cuda_timer.start(m_measure.m_launch.get_stream());
156-
if (m_disable_blocking_kernel)
157-
{
158-
m_measure.m_cpu_timer.start();
159-
}
156+
// start CPU timer irrespective of use of blocking kernel
157+
// Ref: https://github.com/NVIDIA/nvbench/issues/249
158+
m_measure.m_cpu_timer.start();
160159
}
161160

162161
__forceinline__ void stop()
163162
{
164163
m_measure.m_cuda_timer.stop(m_measure.m_launch.get_stream());
165164
if (!m_disable_blocking_kernel)
166165
{
167-
m_measure.m_cpu_timer.start();
168166
m_measure.unblock_stream();
169167
}
170168
if (m_measure.m_check_throttling)

0 commit comments

Comments
 (0)