-
Notifications
You must be signed in to change notification settings - Fork 90
Allow users to control iteration via the concept of iteration spaces. #80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
robertmaynard
wants to merge
44
commits into
NVIDIA:main
Choose a base branch
from
robertmaynard:fea/axes_iteration_space
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from 26 commits
Commits
Show all changes
44 commits
Select commit
Hold shift + click to select a range
344878e
Allow users to control iteration via the concept of iteration spaces.
robertmaynard a25f578
Rename tie_axes to zip_axes
robertmaynard c3c86e1
implement easier API to add axis and zip/user iteration at the same time
robertmaynard 91c8f43
Show zip versus linear iteration
robertmaynard f4570d4
Update docs/benchmarks.md
robertmaynard f791475
Update docs/benchmarks.md
robertmaynard f50a6dd
Update nvbench/axis_iteration_space.cxx
robertmaynard 796f7f7
Update nvbench/axis_iteration_space.cuh
robertmaynard edba477
Update nvbench/axis_iteration_space.cxx
robertmaynard 9337ba9
Update examples/custom_iteration_spaces.cu
robertmaynard a02d648
Update examples/custom_iteration_spaces.cu
robertmaynard e80392e
Update examples/custom_iteration_spaces.cu
robertmaynard 4c964d2
Update examples/custom_iteration_spaces.cu
robertmaynard 26467f3
More cleanup
robertmaynard 5b000e8
More cleanup
robertmaynard ba8356f
Refactor names
robertmaynard 40a6711
Document benchmark iteration space methods
robertmaynard e7b4800
Refactor axis spaces into separate TUs
robertmaynard 9aa2feb
Add iteration_space_base docs
robertmaynard 06a4c8f
rename files holding iteration_space_base to match name of class
robertmaynard 454d1bf
Add more docs
robertmaynard 8af9453
Add more docs
robertmaynard 6fd0883
drop usage of std::tie in nvbench/axes_metadata.cxx
robertmaynard 99395df
Update to cross reference docs
robertmaynard 5ebe7fe
Update docs around iteration_space_base constructor
robertmaynard dc7e2b7
Drop ability to zip axii after construction
robertmaynard 5708e6c
remove need for make_space_iterator
robertmaynard 3ad3d65
update axis_space_iterator to use same method names as state_iterator
robertmaynard c2bfc99
remove need for output_indices
robertmaynard 910b5cc
Simplified user iterators
robertmaynard 50e764a
Merge remote-tracking branch 'origin/main' into fea/axes_iteration_space
alliepiper 5016bec
Merge remote-tracking branch 'origin/main' into fea/axes_iteration_space
alliepiper 62cbdc5
Reduce runtime of gaussian iteration example.
alliepiper edefcd0
Update fmt usage for new version.
alliepiper 250d755
Update new test to support device-init changes.
alliepiper 4defa02
Precommit reformatting.
alliepiper 4bd5690
Fix warnings on clang.
alliepiper cca9308
Update docs, whitespace.
alliepiper c9d92eb
Merge remote-tracking branch 'origin/main' into fea/axes_iteration_space
alliepiper 498c73d
Updated some docs.
alliepiper 0fae005
Rename header to match class name
alliepiper d7989dd
Docs update.
alliepiper a2bf266
Rename some space -> spaces for clarity.
alliepiper c8909c7
Refactoring / renaming.
alliepiper File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,243 @@ | ||
| /* | ||
| * Copyright 2021 NVIDIA Corporation | ||
| * | ||
| * Licensed under the Apache License, Version 2.0 with the LLVM exception | ||
| * (the "License"); you may not use this file except in compliance with | ||
| * the License. | ||
| * | ||
| * You may obtain a copy of the License at | ||
| * | ||
| * http://llvm.org/foundation/relicensing/LICENSE.txt | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| #include <nvbench/nvbench.cuh> | ||
|
|
||
| // Grab some testing kernels from NVBench: | ||
| #include <nvbench/test_kernels.cuh> | ||
|
|
||
| // Thrust vectors simplify memory management: | ||
| #include <thrust/device_vector.h> | ||
|
|
||
| #include <random> | ||
|
|
||
| //============================================================================== | ||
| // Multiple parameters: | ||
| // Varies block_size and num_blocks while invoking a naive copy of 256 MiB worth | ||
| // of int32_t. | ||
| void copy_sweep_grid_shape(nvbench::state &state) | ||
| { | ||
| // Get current parameters: | ||
| const int block_size = static_cast<int>(state.get_int64("BlockSize")); | ||
| const int num_blocks = static_cast<int>(state.get_int64("NumBlocks")); | ||
|
|
||
| // Number of int32s in 256 MiB: | ||
| const std::size_t num_values = 256 * 1024 * 1024 / sizeof(nvbench::int32_t); | ||
|
|
||
| // Report throughput stats: | ||
| state.add_element_count(num_values); | ||
| state.add_global_memory_reads<nvbench::int32_t>(num_values); | ||
| state.add_global_memory_writes<nvbench::int32_t>(num_values); | ||
|
|
||
| // Allocate device memory: | ||
| thrust::device_vector<nvbench::int32_t> in(num_values, 0); | ||
| thrust::device_vector<nvbench::int32_t> out(num_values, 0); | ||
|
|
||
| state.exec( | ||
| [block_size, | ||
| num_blocks, | ||
| num_values, | ||
| in_ptr = thrust::raw_pointer_cast(in.data()), | ||
| out_ptr = thrust::raw_pointer_cast(out.data())](nvbench::launch &launch) { | ||
| nvbench::copy_kernel<<<num_blocks, block_size, 0, launch.get_stream()>>>( | ||
| in_ptr, | ||
| out_ptr, | ||
| num_values); | ||
| }); | ||
| } | ||
|
|
||
| //============================================================================== | ||
| // Naive iteration of both the BlockSize and NumBlocks axes. | ||
| // Will generate the full cartesian product of the two axes for a total of | ||
| // 16 invocations of copy_sweep_grid_shape. | ||
| NVBENCH_BENCH(copy_sweep_grid_shape) | ||
| .set_name("naive_copy_sweep_grid_shape") | ||
| .add_int64_axis("BlockSize", {32, 64, 128, 256}) | ||
| .add_int64_axis("NumBlocks", {1024, 512, 256, 128}); | ||
|
|
||
| //============================================================================== | ||
| // Zipped iteration of BlockSize and NumBlocks axes. | ||
| // Will generate only 4 invocations of copy_sweep_grid_shape | ||
| NVBENCH_BENCH(copy_sweep_grid_shape) | ||
| .set_name("tied_copy_sweep_grid_shape") | ||
| .add_zip_axes(nvbench::int64_axis{"BlockSize", {32, 64, 128, 256}}, | ||
| nvbench::int64_axis{"NumBlocks", {1024, 512, 256, 128}}); | ||
|
|
||
| //============================================================================== | ||
| // under_diag: | ||
| // Custom iterator that only searches the `X` locations of two axes: | ||
| // [- - - - X] | ||
| // [- - - X X] | ||
| // [- - X X X] | ||
| // [- X X X X] | ||
| // [X X X X X] | ||
| // | ||
| struct under_diag final : nvbench::user_axis_space | ||
| { | ||
| under_diag(std::vector<std::size_t> input_indices, | ||
| std::vector<std::size_t> output_indices) | ||
| : nvbench::user_axis_space(std::move(input_indices), | ||
| std::move(output_indices)) | ||
| {} | ||
|
|
||
| mutable std::size_t x_pos = 0; | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should find a way to make this work without mutable. |
||
| mutable std::size_t y_pos = 0; | ||
| mutable std::size_t x_start = 0; | ||
|
|
||
| nvbench::detail::axis_space_iterator do_get_iterator(axes_info info) const | ||
| { | ||
| // generate our increment function | ||
| auto adv_func = [&, info](std::size_t &inc_index, | ||
| std::size_t /*len*/) -> bool { | ||
| inc_index++; | ||
| x_pos++; | ||
| if (x_pos == info[0].size) | ||
| { | ||
| x_pos = ++x_start; | ||
| y_pos = x_start; | ||
| return true; | ||
| } | ||
| return false; | ||
| }; | ||
|
|
||
| // our update function | ||
| std::vector<std::size_t> locs = m_output_indices; | ||
| auto diag_under = | ||
| [&, locs, info](std::size_t, | ||
| std::vector<nvbench::detail::axis_index> &indices) { | ||
| nvbench::detail::axis_index temp = info[0]; | ||
| temp.index = x_pos; | ||
| indices[locs[0]] = temp; | ||
|
|
||
| temp = info[1]; | ||
| temp.index = y_pos; | ||
| indices[locs[1]] = temp; | ||
| }; | ||
|
|
||
| const size_t iteration_length = ((info[0].size * (info[1].size + 1)) / 2); | ||
| return nvbench::detail::make_space_iterator(2, | ||
| iteration_length, | ||
| adv_func, | ||
| diag_under); | ||
| } | ||
|
|
||
| std::size_t do_get_size(const axes_info &info) const | ||
| { | ||
| return ((info[0].size * (info[1].size + 1)) / 2); | ||
| } | ||
|
|
||
| std::size_t do_get_active_count(const axes_info &info) const | ||
| { | ||
| return ((info[0].size * (info[1].size + 1)) / 2); | ||
| } | ||
|
|
||
| std::unique_ptr<nvbench::iteration_space_base> do_clone() const | ||
| { | ||
| return std::make_unique<under_diag>(*this); | ||
| } | ||
| }; | ||
|
|
||
| NVBENCH_BENCH(copy_sweep_grid_shape) | ||
| .set_name("user_copy_sweep_grid_shape") | ||
| .add_user_iteration_axes( | ||
| [](auto... args) -> std::unique_ptr<nvbench::iteration_space_base> { | ||
| return std::make_unique<under_diag>(args...); | ||
| }, | ||
| nvbench::int64_axis("BlockSize", {64, 128, 256, 512, 1024}), | ||
| nvbench::int64_axis("NumBlocks", {1024, 521, 256, 128, 64})); | ||
|
|
||
| //============================================================================== | ||
| // gauss: | ||
| // Custom iteration space that uses a gauss distribution to | ||
| // sample the points near the middle of the index space | ||
| // | ||
| struct gauss final : nvbench::user_axis_space | ||
robertmaynard marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| { | ||
|
|
||
| gauss(std::vector<std::size_t> input_indices, | ||
| std::vector<std::size_t> output_indices) | ||
| : nvbench::user_axis_space(std::move(input_indices), | ||
| std::move(output_indices)) | ||
| {} | ||
|
|
||
| nvbench::detail::axis_space_iterator do_get_iterator(axes_info info) const | ||
| { | ||
| const double mid_point = static_cast<double>((info[0].size / 2)); | ||
|
|
||
| std::random_device rd{}; | ||
| std::mt19937 gen{rd()}; | ||
| std::normal_distribution<> d{mid_point, 2}; | ||
|
|
||
| const size_t iteration_length = info[0].size; | ||
| std::vector<std::size_t> gauss_indices(iteration_length); | ||
| for (auto &g : gauss_indices) | ||
| { | ||
| auto v = std::min(static_cast<double>(info[0].size), d(gen)); | ||
| v = std::max(0.0, v); | ||
| g = static_cast<std::size_t>(v); | ||
| } | ||
|
|
||
| // our update function | ||
| std::vector<std::size_t> locs = m_output_indices; | ||
| auto gauss_func = [=](std::size_t index, | ||
| std::vector<nvbench::detail::axis_index> &indices) { | ||
| nvbench::detail::axis_index temp = info[0]; | ||
| temp.index = gauss_indices[index]; | ||
| indices[locs[0]] = temp; | ||
| }; | ||
|
|
||
| return nvbench::detail::make_space_iterator(1, | ||
| iteration_length, | ||
| gauss_func); | ||
| } | ||
|
|
||
| std::size_t do_get_size(const axes_info &info) const { return info[0].size; } | ||
|
|
||
| std::size_t do_get_active_count(const axes_info &info) const | ||
| { | ||
| return info[0].size; | ||
| } | ||
|
|
||
| std::unique_ptr<iteration_space_base> do_clone() const | ||
| { | ||
| return std::make_unique<gauss>(*this); | ||
| } | ||
| }; | ||
| //============================================================================== | ||
| // Dual parameter sweep: | ||
| void dual_float64_axis(nvbench::state &state) | ||
| { | ||
| const auto duration_A = state.get_float64("Duration_A"); | ||
| const auto duration_B = state.get_float64("Duration_B"); | ||
|
|
||
| state.exec([duration_A, duration_B](nvbench::launch &launch) { | ||
| nvbench::sleep_kernel<<<1, 1, 0, launch.get_stream()>>>(duration_A + | ||
| duration_B); | ||
| }); | ||
| } | ||
| NVBENCH_BENCH(dual_float64_axis) | ||
| .add_user_iteration_axes( | ||
| [](auto... args) -> std::unique_ptr<nvbench::iteration_space_base> { | ||
| return std::make_unique<gauss>(args...); | ||
| }, | ||
| nvbench::float64_axis("Duration_A", nvbench::range(0., 1e-4, 1e-5))) | ||
| .add_user_iteration_axes( | ||
| [](auto... args) -> std::unique_ptr<nvbench::iteration_space_base> { | ||
| return std::make_unique<gauss>(args...); | ||
| }, | ||
| nvbench::float64_axis("Duration_B", nvbench::range(0., 1e-4, 1e-5))); | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example implementations, it'd be helpful to have more comments describing what each component of this struct is doing (e.g. how do size and valid_count differ?)