Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions benchmarks/hash_table/dynamic_map_bench.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020, NVIDIA CORPORATION.
* Copyright (c) 2020-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,11 +14,14 @@
* limitations under the License.
*/

#include <benchmark/benchmark.h>
#include <synchronization.hpp>

#include <cuco/dynamic_map.cuh>

#include <benchmark/benchmark.h>

#include <iostream>
#include <random>
Comment on lines +17 to 24
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a permutation of the same includes. I think clang_format is already sorting these in lexicographic order for us if they are defined in a contiguous block, which is nice and consistent. However, lexicographic order might not always be correct or desired. Maybe we can customize the clang_format file to always produce an ordering that satisfies our needs and then just let the CI take care of it.

Copy link
Member Author

@PointKernel PointKernel May 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is to separate different header groups and order them from "near" to "far". e.g. synchronization.hpp is a bench-local header thus it's placed before the library header cuco/dynamic_map.cuh. gbench header is even further but considered closer than std headers.

This grouping method seems a bit awkward with only one file in each group but will show its advantage with more headers involved.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, this makes sense. I wonder however if we should add a CI script for that. Basically extract all includes, check in the include tree where this file originates, group and then reorder.

#include <synchronization.hpp>

enum class dist_type { UNIQUE, UNIFORM, GAUSSIAN };

Expand Down
10 changes: 5 additions & 5 deletions benchmarks/hash_table/static_multimap/count_bench.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
* Copyright (c) 2021-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,13 +14,13 @@
* limitations under the License.
*/

#include <random>
#include <key_generator.hpp>

#include <cuco/static_multimap.cuh>

#include <nvbench/nvbench.cuh>
#include <thrust/device_vector.h>

#include <cuco/static_multimap.cuh>
#include <key_generator.hpp>
#include <thrust/device_vector.h>

/**
* @brief A benchmark evaluating multi-value `count` performance:
Expand Down
10 changes: 5 additions & 5 deletions benchmarks/hash_table/static_multimap/insert_bench.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
* Copyright (c) 2021-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,13 +14,13 @@
* limitations under the License.
*/

#include <random>
#include <key_generator.hpp>

#include <cuco/static_multimap.cuh>

#include <nvbench/nvbench.cuh>
#include <thrust/device_vector.h>

#include <cuco/static_multimap.cuh>
#include <key_generator.hpp>
#include <thrust/device_vector.h>

/**
* @brief A benchmark evaluating multi-value `insert` performance:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
* Copyright (c) 2021-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,13 +14,12 @@
* limitations under the License.
*/

#include <cuco/static_multimap.cuh>

#include <nvbench/nvbench.cuh>

#include <random>
#include <thrust/device_vector.h>

#include "cuco/static_multimap.cuh"

/**
* @brief Generates input keys by a given number of repetitions per key.
*
Expand Down
10 changes: 5 additions & 5 deletions benchmarks/hash_table/static_multimap/pair_retrieve_bench.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
* Copyright (c) 2021-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,15 +14,15 @@
* limitations under the License.
*/

#include <random>
#include <key_generator.hpp>

#include <cuco/static_multimap.cuh>

#include <nvbench/nvbench.cuh>

#include <thrust/device_vector.h>
#include <thrust/iterator/discard_iterator.h>

#include <cuco/static_multimap.cuh>
#include <key_generator.hpp>

namespace {
// Custom pair equal
template <typename Key, typename Value>
Expand Down
10 changes: 5 additions & 5 deletions benchmarks/hash_table/static_multimap/query_bench.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
* Copyright (c) 2021-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,13 +14,13 @@
* limitations under the License.
*/

#include <random>
#include <key_generator.hpp>

#include <cuco/static_multimap.cuh>

#include <nvbench/nvbench.cuh>
#include <thrust/device_vector.h>

#include <cuco/static_multimap.cuh>
#include <key_generator.hpp>
#include <thrust/device_vector.h>

/**
* @brief A benchmark evaluating multi-value query (`count` + `retrieve`) performance:
Expand Down
7 changes: 3 additions & 4 deletions benchmarks/hash_table/static_multimap/retrieve_bench.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
* Copyright (c) 2021-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,14 +14,13 @@
* limitations under the License.
*/

#include <cuco/static_multimap.cuh>
#include <key_generator.hpp>

#include <thrust/device_vector.h>
#include <cuco/static_multimap.cuh>

#include <nvbench/nvbench.cuh>

#include <random>
#include <thrust/device_vector.h>

/**
* @brief A benchmark evaluating multi-value `retrieve` performance:
Expand Down
5 changes: 3 additions & 2 deletions benchmarks/synchronization.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020, NVIDIA CORPORATION.
* Copyright (c) 2020-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -18,6 +18,7 @@

// Google Benchmark library
#include <benchmark/benchmark.h>

#include <cuda_runtime_api.h>

#define BENCH_CUDA_TRY(call) \
Expand Down Expand Up @@ -129,4 +130,4 @@ class cuda_event_timer {
cudaEvent_t stop_;
cudaStream_t stream_;
benchmark::State* p_state;
};
};
6 changes: 3 additions & 3 deletions examples/static_map/custom_type_example.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
* Copyright (c) 2021-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,12 +14,12 @@
* limitations under the License.
*/

#include <cuco/static_map.cuh>

#include <thrust/device_vector.h>
#include <thrust/logical.h>
#include <thrust/transform.h>

#include <cuco/static_map.cuh>

// User-defined key type
#ifdef CUCO_NO_INDEPENDENT_THREADS
struct custom_key_type {
Expand Down
6 changes: 3 additions & 3 deletions examples/static_map/static_map_example.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020, NVIDIA CORPORATION.
* Copyright (c) 2020-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,13 +14,13 @@
* limitations under the License.
*/

#include <limits>
#include <cuco/static_map.cuh>

#include <thrust/device_vector.h>
#include <thrust/sequence.h>
#include <thrust/transform.h>

#include <cuco/static_map.cuh>
#include <limits>

int main(void)
{
Expand Down
6 changes: 3 additions & 3 deletions examples/static_multimap/static_multimap_example.cu
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
* Copyright (c) 2021-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,13 +14,13 @@
* limitations under the License.
*/

#include <limits>
#include <cuco/static_multimap.cuh>

#include <thrust/device_vector.h>
#include <thrust/sequence.h>
#include <thrust/transform.h>

#include <cuco/static_multimap.cuh>
#include <limits>

int main(void)
{
Expand Down
10 changes: 8 additions & 2 deletions include/cuco/detail/dynamic_map_kernels.cuh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020, NVIDIA CORPORATION.
* Copyright (c) 2020-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,6 +14,12 @@
* limitations under the License.
*/

#include <cub/block/block_reduce.cuh>

#include <cuda/std/atomic>

#include <cooperative_groups.h>

namespace cuco {
namespace detail {
namespace cg = cooperative_groups;
Expand Down Expand Up @@ -457,4 +463,4 @@ __global__ void contains(InputIt first,
}
}
} // namespace detail
} // namespace cuco
} // namespace cuco
1 change: 1 addition & 0 deletions include/cuco/detail/error.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#pragma once

#include <cuda_runtime_api.h>

#include <stdexcept>
#include <string>

Expand Down
4 changes: 3 additions & 1 deletion include/cuco/detail/pair.cuh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2020-2021, NVIDIA CORPORATION.
* Copyright (c) 2020-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -19,6 +19,8 @@
#include <thrust/device_reference.h>
#include <thrust/pair.h>
#include <thrust/tuple.h>

#include <algorithm>
#include <tuple>
#include <type_traits>

Expand Down
6 changes: 3 additions & 3 deletions include/cuco/detail/probe_sequence_impl.cuh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
* Copyright (c) 2021-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -16,11 +16,11 @@

#pragma once

#include <cuda/std/atomic>

#include <cuco/detail/hash_functions.cuh>
#include <cuco/detail/pair.cuh>

#include <cuda/std/atomic>

namespace cuco {
namespace detail {

Expand Down
6 changes: 6 additions & 0 deletions include/cuco/detail/static_map_kernels.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,12 @@
* limitations under the License.
*/

#include <cub/block/block_reduce.cuh>

#include <cuda/std/atomic>

#include <cooperative_groups.h>

namespace cuco {
namespace detail {
namespace cg = cooperative_groups;
Expand Down
11 changes: 8 additions & 3 deletions include/cuco/detail/static_multimap/kernels.cuh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION.
* Copyright (c) 2021-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,10 +14,15 @@
* limitations under the License.
*/

#include <cooperative_groups/memcpy_async.h>
#include <cuco/detail/pair.cuh>

#include <thrust/type_traits/is_contiguous_iterator.h>

#include <cuco/detail/pair.cuh>
#include <cub/block/block_reduce.cuh>

#include <cuda/std/atomic>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We access the atomic types through a typedef atomicT, so we don't need to include <cuda/std/atomic> here... At least I think that's the case.

Copy link
Member Author

@PointKernel PointKernel May 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still needed due to atomic operations like this:

num_matches->fetch_add(block_num_matches, cuda::std::memory_order_relaxed);

Removing it will cause build failure:

/home/yunsongw/Work/cuCollections/include/cuco/detail/static_multimap/kernels.cuh(256): error: name followed by "::" must be a class or namespace name
          detected during instantiation of "std::size_t cuco::static_multimap<Key, Value, Scope, Allocator, ProbeSequence>::count_outer(InputIt, InputIt, cudaStream_t, KeyEqual) const [with Key=int, Value=int, Scope=cuda::std::__4::__detail::thread_scope_device, Allocator=cuco::cuda_allocator<char>, ProbeSequence=cuco::double_hashing<8U, cuco::detail::MurmurHash3_32<int>, cuco::detail::MurmurHash3_32<int>>, InputIt=thrust::detail::normal_iterator<thrust::device_ptr<int>>, KeyEqual=thrust::equal_to<int>]" 

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but in L242 we call view.count() which is also not declared (only after the template is instantiated).


#include <cooperative_groups/memcpy_async.h>

namespace cuco {
namespace detail {
Expand Down
4 changes: 3 additions & 1 deletion include/cuco/detail/static_multimap/static_multimap.inl
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,13 @@
* limitations under the License.
*/

#include <cuco/detail/utils.hpp>

#include <thrust/count.h>
#include <thrust/iterator/transform_iterator.h>
#include <thrust/tuple.h>

#include <cuco/detail/utils.hpp>
#include <iterator>

namespace {
/**
Expand Down
Loading