Skip to content

Commit 344878e

Browse files
committed
Allow users to control iteration via the concept of iteration spaces.
Changes in the work include: - [x] Internally use linear_space for iterating - [x] Simplify type and value iteration in `state_iterator::build_axis_configs` - [x] Store the iteration space in `axes_metadata` - [x] Expose `tie` and `user` spaces to user - [x] Add tests for `linear`, `tie`, and `user` - [x] Add examples for `tie` and `user`
1 parent 9eed5ab commit 344878e

18 files changed

+1373
-179
lines changed

docs/benchmarks.md

Lines changed: 31 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -237,9 +237,34 @@ NVBENCH_BENCH_TYPES(benchmark, NVBENCH_TYPE_AXES(input_types, output_types))
237237
```
238238

239239
This would generate a total of 36 configurations and instantiate the benchmark 6
240-
times. Keep the rapid growth of these combinations in mind when choosing the
241-
number of values in an axis. See the section about combinatorial explosion for
242-
more examples and information.
240+
times.
241+
242+
Keep the rapid growth of combinations due to multiple parameter axes in mind when
243+
choosing the number of values in an axis. See the section about combinatorial
244+
explosion for more examples and information.
245+
246+
## Zipped/Tied Iteration of Value Axes
247+
248+
At times multiple value axes need to be iterated like they are actually a tuple
249+
or zipped together. To enable this behavior you can request axes to be 'tied'
250+
together.
251+
252+
```cpp
253+
// InputTypes: {char, int, unsigned int}
254+
// OutputTypes: {float, double}
255+
// NumInputs: {2^10, 2^20, 2^30}
256+
// Quality: {0.5, 1.0}
257+
258+
using input_types = nvbench::type_list<char, int, unsigned int>;
259+
using output_types = nvbench::type_list<float, double>;
260+
NVBENCH_BENCH_TYPES(benchmark, NVBENCH_TYPE_AXES(input_types, output_types))
261+
.set_type_axes_names({"InputType", "OutputType"})
262+
.add_int64_axis("NumInputs", {1000, 10000, 100000, 200000, 200000, 200000})
263+
.add_float64_axis("Quality", {0.05, 0.1, 0.25, 0.5, 0.75, 1.});
264+
```
265+
266+
This tieing reduces the total combinations from 24 to 6, reducing the
267+
combinatorial explosion.
243268
244269
# Throughput Measurements
245270
@@ -426,9 +451,9 @@ NVBENCH_BENCH_TYPES(my_benchmark,
426451
```
427452
428453
For large configuration spaces like this, pruning some of the less useful
429-
combinations (e.g. `sizeof(init_type) < sizeof(output)`) using the techniques
430-
described in the "Skip Uninteresting / Invalid Benchmarks" section can help
431-
immensely with keeping compile / run times manageable.
454+
combinations using the techniques described in the "Zipped/Tied Iteration of Value Axes"
455+
or "Skip Uninteresting / Invalid Benchmarks" section can help immensely with
456+
keeping compile / run times manageable.
432457
433458
Splitting a single large configuration space into multiple, more focused
434459
benchmarks with reduced dimensionality will likely be worth the effort as well.

examples/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ set(example_srcs
77
stream.cu
88
throughput.cu
99
auto_throughput.cu
10+
custom_iteration_spaces.cu
1011
)
1112

1213
# Metatarget for all examples:
Lines changed: 247 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,247 @@
1+
/*
2+
* Copyright 2021 NVIDIA Corporation
3+
*
4+
* Licensed under the Apache License, Version 2.0 with the LLVM exception
5+
* (the "License"); you may not use this file except in compliance with
6+
* the License.
7+
*
8+
* You may obtain a copy of the License at
9+
*
10+
* http://llvm.org/foundation/relicensing/LICENSE.txt
11+
*
12+
* Unless required by applicable law or agreed to in writing, software
13+
* distributed under the License is distributed on an "AS IS" BASIS,
14+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
* See the License for the specific language governing permissions and
16+
* limitations under the License.
17+
*/
18+
19+
#include <nvbench/nvbench.cuh>
20+
21+
// Grab some testing kernels from NVBench:
22+
#include <nvbench/test_kernels.cuh>
23+
24+
// Thrust vectors simplify memory management:
25+
#include <thrust/device_vector.h>
26+
27+
#include <random>
28+
29+
//==============================================================================
30+
// Multiple parameters:
31+
// Varies block_size and num_blocks while invoking a naive copy of 256 MiB worth
32+
// of int32_t.
33+
void copy_sweep_grid_shape(nvbench::state &state)
34+
{
35+
// Get current parameters:
36+
const int block_size = static_cast<int>(state.get_int64("BlockSize"));
37+
const int num_blocks = static_cast<int>(state.get_int64("NumBlocks"));
38+
39+
// Number of int32s in 256 MiB:
40+
const std::size_t num_values = 256 * 1024 * 1024 / sizeof(nvbench::int32_t);
41+
42+
// Report throughput stats:
43+
state.add_element_count(num_values);
44+
state.add_global_memory_reads<nvbench::int32_t>(num_values);
45+
state.add_global_memory_writes<nvbench::int32_t>(num_values);
46+
47+
// Allocate device memory:
48+
thrust::device_vector<nvbench::int32_t> in(num_values, 0);
49+
thrust::device_vector<nvbench::int32_t> out(num_values, 0);
50+
51+
state.exec(
52+
[block_size,
53+
num_blocks,
54+
num_values,
55+
in_ptr = thrust::raw_pointer_cast(in.data()),
56+
out_ptr = thrust::raw_pointer_cast(out.data())](nvbench::launch &launch) {
57+
nvbench::copy_kernel<<<num_blocks, block_size, 0, launch.get_stream()>>>(
58+
in_ptr,
59+
out_ptr,
60+
num_values);
61+
});
62+
}
63+
64+
//==============================================================================
65+
// Tied iteration space allows you to iterate two or more axes at the same
66+
// time allowing for sparse exploration of the search space. Can also be used
67+
// to test the diagonal of a square matrix
68+
//
69+
void tied_copy_sweep_grid_shape(nvbench::state &state)
70+
{
71+
copy_sweep_grid_shape(state);
72+
}
73+
NVBENCH_BENCH(tied_copy_sweep_grid_shape)
74+
// Every power of two from 64->1024:
75+
.add_int64_axis("BlockSize", {32,64,128,256})
76+
.add_int64_axis("NumBlocks", {1024,512,256,128})
77+
.tie_axes({"BlockSize", "NumBlocks"});
78+
79+
//==============================================================================
80+
// under_diag:
81+
// Custom iterator that only searches the `X` locations of two axi
82+
// [- - - - X]
83+
// [- - - X X]
84+
// [- - X X X]
85+
// [- X X X X]
86+
// [X X X X X]
87+
//
88+
struct under_diag final : nvbench::user_axis_space
89+
{
90+
under_diag(std::vector<std::size_t> input_indices,
91+
std::vector<std::size_t> output_indices)
92+
: nvbench::user_axis_space(std::move(input_indices), std::move(output_indices))
93+
{}
94+
95+
mutable std::size_t x_pos = 0;
96+
mutable std::size_t y_pos = 0;
97+
mutable std::size_t x_start = 0;
98+
99+
nvbench::detail::axis_space_iterator do_iter(axes_info info) const
100+
{
101+
// generate our increment function
102+
auto adv_func = [&, info](std::size_t &inc_index,
103+
std::size_t /*len*/) -> bool {
104+
inc_index++;
105+
x_pos++;
106+
if (x_pos == info[0].size)
107+
{
108+
x_pos = ++x_start;
109+
y_pos = x_start;
110+
return true;
111+
}
112+
return false;
113+
};
114+
115+
// our update function
116+
std::vector<std::size_t> locs = m_output_indices;
117+
auto diag_under =
118+
[&, locs, info](std::size_t,
119+
std::vector<nvbench::detail::axis_index> &indices) {
120+
nvbench::detail::axis_index temp = info[0];
121+
temp.index = x_pos;
122+
indices[locs[0]] = temp;
123+
124+
temp = info[1];
125+
temp.index = y_pos;
126+
indices[locs[1]] = temp;
127+
};
128+
129+
const size_t iteration_length = ((info[0].size * (info[1].size + 1)) / 2);
130+
return nvbench::detail::make_space_iterator(2,
131+
iteration_length,
132+
adv_func,
133+
diag_under);
134+
}
135+
136+
std::size_t do_size(const axes_info &info) const
137+
{
138+
return ((info[0].size * (info[1].size + 1)) / 2);
139+
}
140+
141+
std::size_t do_valid_count(const axes_info &info) const
142+
{
143+
return ((info[0].size * (info[1].size + 1)) / 2);
144+
}
145+
146+
std::unique_ptr<nvbench::axis_space_base> do_clone() const
147+
{
148+
return std::make_unique<under_diag>(*this);
149+
}
150+
};
151+
152+
void user_copy_sweep_grid_shape(nvbench::state &state)
153+
{
154+
copy_sweep_grid_shape(state);
155+
}
156+
NVBENCH_BENCH(user_copy_sweep_grid_shape)
157+
// Every power of two from 64->1024:
158+
.add_int64_power_of_two_axis("BlockSize", nvbench::range(6, 10))
159+
.add_int64_power_of_two_axis("NumBlocks", nvbench::range(6, 10))
160+
.user_iteration_axes({"NumBlocks", "BlockSize"},
161+
[](auto... args)
162+
-> std::unique_ptr<nvbench::axis_space_base> {
163+
return std::make_unique<under_diag>(args...);
164+
});
165+
166+
167+
//==============================================================================
168+
// gauss:
169+
// Custom iteration space that uses a gauss distribution to
170+
// sample the points near the middle of the index space
171+
//
172+
struct gauss final : nvbench::user_axis_space
173+
{
174+
175+
gauss(std::vector<std::size_t> input_indices,
176+
std::vector<std::size_t> output_indices)
177+
: nvbench::user_axis_space(std::move(input_indices), std::move(output_indices))
178+
{}
179+
180+
nvbench::detail::axis_space_iterator do_iter(axes_info info) const
181+
{
182+
const double mid_point = static_cast<double>((info[0].size / 2));
183+
184+
std::random_device rd{};
185+
std::mt19937 gen{rd()};
186+
std::normal_distribution<> d{mid_point, 2};
187+
188+
const size_t iteration_length = info[0].size;
189+
std::vector<std::size_t> gauss_indices(iteration_length);
190+
for (auto &g : gauss_indices)
191+
{
192+
auto v = std::min(static_cast<double>(info[0].size), d(gen));
193+
v = std::max(0.0, v);
194+
g = static_cast<std::size_t>(v);
195+
}
196+
197+
// our update function
198+
std::vector<std::size_t> locs = m_output_indices;
199+
auto gauss_func = [=](std::size_t index,
200+
std::vector<nvbench::detail::axis_index> &indices) {
201+
nvbench::detail::axis_index temp = info[0];
202+
temp.index = gauss_indices[index];
203+
indices[locs[0]] = temp;
204+
};
205+
206+
return nvbench::detail::make_space_iterator(1,
207+
iteration_length,
208+
gauss_func);
209+
}
210+
211+
std::size_t do_size(const axes_info &info) const { return info[0].size; }
212+
213+
std::size_t do_valid_count(const axes_info &info) const
214+
{
215+
return info[0].size;
216+
}
217+
218+
std::unique_ptr<axis_space_base> do_clone() const
219+
{
220+
return std::make_unique<gauss>(*this);
221+
}
222+
};
223+
//==============================================================================
224+
// Dual parameter sweep:
225+
void dual_float64_axis(nvbench::state &state)
226+
{
227+
const auto duration_A = state.get_float64("Duration_A");
228+
const auto duration_B = state.get_float64("Duration_B");
229+
230+
state.exec([duration_A, duration_B](nvbench::launch &launch) {
231+
nvbench::sleep_kernel<<<1, 1, 0, launch.get_stream()>>>(duration_A +
232+
duration_B);
233+
});
234+
}
235+
NVBENCH_BENCH(dual_float64_axis)
236+
.add_float64_axis("Duration_A", nvbench::range(0., 1e-4, 1e-5))
237+
.add_float64_axis("Duration_B", nvbench::range(0., 1e-4, 1e-5))
238+
.user_iteration_axes({"Duration_A"},
239+
[](auto... args)
240+
-> std::unique_ptr<nvbench::axis_space_base> {
241+
return std::make_unique<gauss>(args...);
242+
})
243+
.user_iteration_axes({"Duration_B"},
244+
[](auto... args)
245+
-> std::unique_ptr<nvbench::axis_space_base> {
246+
return std::make_unique<gauss>(args...);
247+
});

nvbench/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
set(srcs
22
axes_metadata.cxx
33
axis_base.cxx
4+
axis_iteration_space.cxx
45
benchmark_base.cxx
56
benchmark_manager.cxx
67
blocking_kernel.cu

nvbench/axes_metadata.cuh

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,12 +18,14 @@
1818

1919
#pragma once
2020

21+
#include <nvbench/axis_iteration_space.cuh>
2122
#include <nvbench/float64_axis.cuh>
2223
#include <nvbench/int64_axis.cuh>
2324
#include <nvbench/string_axis.cuh>
2425
#include <nvbench/type_axis.cuh>
2526
#include <nvbench/types.cuh>
2627

28+
#include <functional>
2729
#include <memory>
2830
#include <stdexcept>
2931
#include <string>
@@ -37,6 +39,8 @@ namespace nvbench
3739
struct axes_metadata
3840
{
3941
using axes_type = std::vector<std::unique_ptr<nvbench::axis_base>>;
42+
using axes_iteration_space =
43+
std::vector<std::unique_ptr<nvbench::axis_space_base>>;
4044

4145
template <typename... TypeAxes>
4246
explicit axes_metadata(nvbench::type_list<TypeAxes...>);
@@ -58,6 +62,21 @@ struct axes_metadata
5862

5963
void add_string_axis(std::string name, std::vector<std::string> data);
6064

65+
void tie_axes(std::vector<std::string> names);
66+
67+
void
68+
user_iteration_axes(std::vector<std::string> names,
69+
std::function<nvbench::make_user_space_signature> make);
70+
71+
[[nodiscard]] const axes_iteration_space &get_type_iteration_space() const
72+
{
73+
return m_type_space;
74+
}
75+
[[nodiscard]] const axes_iteration_space &get_value_iteration_space() const
76+
{
77+
return m_value_space;
78+
}
79+
6180
[[nodiscard]] const nvbench::int64_axis &
6281
get_int64_axis(std::string_view name) const;
6382
[[nodiscard]] nvbench::int64_axis &get_int64_axis(std::string_view name);
@@ -93,6 +112,9 @@ struct axes_metadata
93112

94113
private:
95114
axes_type m_axes;
115+
std::size_t m_type_axe_count = 0;
116+
axes_iteration_space m_type_space;
117+
axes_iteration_space m_value_space;
96118
};
97119

98120
template <typename... TypeAxes>
@@ -105,11 +127,15 @@ axes_metadata::axes_metadata(nvbench::type_list<TypeAxes...>)
105127

106128
auto names_iter = names.begin(); // contents will be moved from
107129
nvbench::tl::foreach<type_axes_list>(
108-
[&axes = m_axes, &names_iter]([[maybe_unused]] auto wrapped_type) {
130+
[&axes = m_axes, &spaces = m_type_space, &names_iter](
131+
[[maybe_unused]] auto wrapped_type) {
109132
// This is always called before other axes are added, so the length of the
110133
// axes vector will be the type axis index:
111134
const std::size_t type_axis_index = axes.size();
112135

136+
spaces.push_back(
137+
std::make_unique<linear_axis_space>(type_axis_index, type_axis_index));
138+
113139
// Note:
114140
// The word "type" appears 6 times in the next line.
115141
// Every. Single. Token.
@@ -119,6 +145,7 @@ axes_metadata::axes_metadata(nvbench::type_list<TypeAxes...>)
119145
axis->template set_inputs<type_list>();
120146
axes.push_back(std::move(axis));
121147
});
148+
m_type_axe_count = m_axes.size();
122149
}
123150

124151
} // namespace nvbench

0 commit comments

Comments
 (0)