-
Notifications
You must be signed in to change notification settings - Fork 341
Optimized implementation of MIS #5294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
|
||
| rmm::device_uvector<vertex_t> temporary_ranks(local_vtx_partitoin_size, handle.get_stream()); | ||
| thrust::copy(handle.get_thrust_policy(), ranks.begin(), ranks.end(), temporary_ranks.begin()); | ||
| vertex_t isolated_v_start = multi_gpu ? segment_offsets->data()[4] : segment_offsets->data()[3]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hypersparse_degree_threshold > 1 is not necessarily true with multi_gpu.
So, this code is inaccurate.
vertex_t isolated_v_start = *(segment_offsets->rbegin() + 1);
The last segment is always a 0 degree segment; so this code should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And there is no guarantee that segement_offsts.has_value() is true. If renumber is set to false, segment_offsets.has_value() is false.
In this case, you need to compute the degrees, and scan the vertex list based on the degrees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isolated_v_start here means num_local_nzd_vertices (nzd = non-zero-degree). I guess num_local_nzd_vertices better describes the intention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hypersparse_degree_threshold > 1 is not necessarily true with multi_gpu.
Thanks for catching this. After analyzing the datasets used in the paper, i realized None have hypersparse regions so I wasn't pre-filling the priority values of the isolated vertices (because I was always starting from local_vertex_partition_range_last()). After applying the suggested change, I get a small speedup ~2%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isolated_v_start here means num_local_nzd_vertices (nzd = non-zero-degree). I guess num_local_nzd_vertices better describes the intention.
right but the former indicate a range and the latter a number. Maybe local_nzd_vertices_begin ?
| rmm::device_uvector<vertex_t> remaining_vertices(isolated_v_start, handle.get_stream()); | ||
|
|
||
| // Select a random set of candidate vertices | ||
| thrust::for_each( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is thrust::transform. Better use a more specific algorithm than a generic for_each if possible.
And what you are doing is basically setting ranks to vertex ID if degree is non-zero and std::numeric_limit<vertex_t>::max() if isolated.
And copying remaining_vertices.
if (segment_offsets) {
thrust::copy(handle.get_thrust_policy(), thrust::make_counting_iterator(graph_view.local_vertex_partition_range_first()), thrust::make_counting_iterator(graph_view.local_vertex_partition_range_first()) + *(segment_offsets->rbegin() + 1),
remaining_vertices.begin());
thrust::copy(handle.get_thrust_policy(), thrust::make_counting_iterator(graph_view.local_vertex_partition_range_first()), thrust::make_counting_iterator(graph_view.local_vertex_partition_range_first()) + *(segment_offsets->rbegin() + 1), ranks.begin());
thrust::fill(handle.get_thrust_policy(), ranks.begin( )+ *(segment_offsets->rbegin() + 1), ranks.end(), std::numeirc_limits<vertex_t>::max());
}
else {
compute degrees and set ranks, remaining_vertices based on degrees.
}
| num_buckets); | ||
|
|
||
| size_t loop_counter = 0; | ||
| vertex_t nr_remaining_vertices_to_check = remaining_vertices.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This holds an invalid value if # GPUs > 1.
| } | ||
| }); | ||
| while (true) { | ||
| loop_counter++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is minor point but it is more natural to increase the loop_count at the end of the loop. We are starting loop 0. But loop_count is already 1. This is conceptually a bit misleading
|
|
||
| // FIXME: Since we know that the property being updated are either | ||
| // std::numeric_limits<vertex_t>::max() or std::numeric_limits<vertex_t>::min(), | ||
| // explore 'fill_edge_dst_property' which is faster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in case of update, you need to communicate both vertices and values. In case of fill, you need to communicate only vertices (and if # vertices is large compared to # local vertex partition size, we use bitmaps to cut communication volume).
So, really the trade-off here is two cheaper calls vs one more expensive call.
| // Only update the property of endpoints that had their ranks modified | ||
| rmm::device_uvector<vertex_t> processed_ranks( | ||
| num_processed_vertices, handle.get_stream()); | ||
|
|
||
| auto pair_idx_processed_vertex_first = thrust::make_zip_iterator( | ||
| thrust::make_counting_iterator<size_t>(0), | ||
| remaining_vertices.begin() + nr_remaining_local_vertices_to_check | ||
| ); | ||
|
|
||
| thrust::for_each( | ||
| handle.get_thrust_policy(), | ||
| pair_idx_processed_vertex_first, | ||
| pair_idx_processed_vertex_first + num_processed_vertices, | ||
| [processed_ranks = | ||
| raft::device_span<vertex_t>(processed_ranks.data(), processed_ranks.size()), | ||
| ranks = | ||
| raft::device_span<vertex_t>(ranks.data(), ranks.size()), | ||
| v_first = graph_view.local_vertex_partition_range_first()] __device__(auto pair_idx_v) { | ||
|
|
||
| auto idx = thrust::get<0>(pair_idx_v); | ||
| auto v = thrust::get<1>(pair_idx_v); | ||
| auto v_offset = v - v_first; | ||
|
|
||
| processed_ranks[idx] = ranks[v_offset]; | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this thrust::gather?
No description provided.