You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/best_practices.md
+11-11Lines changed: 11 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,13 +18,13 @@ Let’s begin with a simple example for users who are new to NVBench and want to
18
18
```cpp
19
19
voidsequence_bench(nvbench::state& state) {
20
20
auto data = thrust::device_vector<int>(10);
21
-
state.exec([](nvbench::launch& launch) {
21
+
state.exec([](nvbench::launch&) {
22
22
thrust::sequence(data.begin(), data.end());
23
23
});
24
24
}
25
25
NVBENCH_BENCH(sequence_bench);
26
26
```
27
-
Will this code work as-is? Depending on the build system configuration, compilation may succeed but generate warnings indicating that `launch` is an unused parameter. The code may or may not execute correctly. This often occurs when users, accustomed to a sequential programming mindset, overlook the fact that GPU architectures are highly parallel. Proper use of streams and synchronization is essential for accurately measuring performance in benchmark code.
27
+
Will this code run correctly as written? While it may compile successfully, runtime behavior isn’t guaranteed. This is a common pitfall for developers used to sequential programming, who may overlook the massively parallel nature of GPU architectures. To ensure accurate performance measurement in benchmark code, proper use of streams and synchronization is crucial.
28
28
29
29
A common mistake in this context is neglecting stream specification: NVBench requires knowledge of the exact CUDA stream being targeted to correctly trace kernel execution and measure performance. Therefore, users must explicitly provide the stream to be benchmarked. For example, passing the NVBench launch stream ensures correct execution and accurate measurement:
30
30
@@ -74,7 +74,7 @@ NVBENCH_BENCH(sequence_bench);
74
74
When the benchmark is executed, results are displayed without issues. However, users, particularly in a multi-GPU environment, may observe that more results are collected than expected:
0 commit comments