Skip to content

Conversation

@jnke2016
Copy link
Contributor

@jnke2016 jnke2016 commented Oct 2, 2025

No description provided.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 2, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.


rmm::device_uvector<vertex_t> temporary_ranks(local_vtx_partitoin_size, handle.get_stream());
thrust::copy(handle.get_thrust_policy(), ranks.begin(), ranks.end(), temporary_ranks.begin());
vertex_t isolated_v_start = multi_gpu ? segment_offsets->data()[4] : segment_offsets->data()[3];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://github.com/rapidsai/cugraph/blob/branch-25.12/cpp/src/structure/renumber_edgelist_impl.cuh#L768

hypersparse_degree_threshold > 1 is not necessarily true with multi_gpu.

So, this code is inaccurate.

vertex_t isolated_v_start = *(segment_offsets->rbegin() + 1);

The last segment is always a 0 degree segment; so this code should work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And there is no guarantee that segement_offsts.has_value() is true. If renumber is set to false, segment_offsets.has_value() is false.

In this case, you need to compute the degrees, and scan the vertex list based on the degrees.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isolated_v_start here means num_local_nzd_vertices (nzd = non-zero-degree). I guess num_local_nzd_vertices better describes the intention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hypersparse_degree_threshold > 1 is not necessarily true with multi_gpu.

Thanks for catching this. After analyzing the datasets used in the paper, i realized None have hypersparse regions so I wasn't pre-filling the priority values of the isolated vertices (because I was always starting from local_vertex_partition_range_last()). After applying the suggested change, I get a small speedup ~2%

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isolated_v_start here means num_local_nzd_vertices (nzd = non-zero-degree). I guess num_local_nzd_vertices better describes the intention.

right but the former indicate a range and the latter a number. Maybe local_nzd_vertices_begin ?

rmm::device_uvector<vertex_t> remaining_vertices(isolated_v_start, handle.get_stream());

// Select a random set of candidate vertices
thrust::for_each(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is thrust::transform. Better use a more specific algorithm than a generic for_each if possible.

And what you are doing is basically setting ranks to vertex ID if degree is non-zero and std::numeric_limit<vertex_t>::max() if isolated.

And copying remaining_vertices.

if (segment_offsets) {
thrust::copy(handle.get_thrust_policy(), thrust::make_counting_iterator(graph_view.local_vertex_partition_range_first()), thrust::make_counting_iterator(graph_view.local_vertex_partition_range_first()) + *(segment_offsets->rbegin() + 1),
remaining_vertices.begin());
thrust::copy(handle.get_thrust_policy(), thrust::make_counting_iterator(graph_view.local_vertex_partition_range_first()), thrust::make_counting_iterator(graph_view.local_vertex_partition_range_first()) + *(segment_offsets->rbegin() + 1), ranks.begin());
thrust::fill(handle.get_thrust_policy(), ranks.begin( )+ *(segment_offsets->rbegin() + 1), ranks.end(), std::numeirc_limits<vertex_t>::max());
}
else {
compute degrees and set ranks, remaining_vertices based on degrees.
}

num_buckets);

size_t loop_counter = 0;
vertex_t nr_remaining_vertices_to_check = remaining_vertices.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This holds an invalid value if # GPUs > 1.

}
});
while (true) {
loop_counter++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is minor point but it is more natural to increase the loop_count at the end of the loop. We are starting loop 0. But loop_count is already 1. This is conceptually a bit misleading


// FIXME: Since we know that the property being updated are either
// std::numeric_limits<vertex_t>::max() or std::numeric_limits<vertex_t>::min(),
// explore 'fill_edge_dst_property' which is faster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in case of update, you need to communicate both vertices and values. In case of fill, you need to communicate only vertices (and if # vertices is large compared to # local vertex partition size, we use bitmaps to cut communication volume).

So, really the trade-off here is two cheaper calls vs one more expensive call.

Comment on lines +123 to +147
// Only update the property of endpoints that had their ranks modified
rmm::device_uvector<vertex_t> processed_ranks(
num_processed_vertices, handle.get_stream());

auto pair_idx_processed_vertex_first = thrust::make_zip_iterator(
thrust::make_counting_iterator<size_t>(0),
remaining_vertices.begin() + nr_remaining_local_vertices_to_check
);

thrust::for_each(
handle.get_thrust_policy(),
pair_idx_processed_vertex_first,
pair_idx_processed_vertex_first + num_processed_vertices,
[processed_ranks =
raft::device_span<vertex_t>(processed_ranks.data(), processed_ranks.size()),
ranks =
raft::device_span<vertex_t>(ranks.data(), ranks.size()),
v_first = graph_view.local_vertex_partition_range_first()] __device__(auto pair_idx_v) {

auto idx = thrust::get<0>(pair_idx_v);
auto v = thrust::get<1>(pair_idx_v);
auto v_offset = v - v_first;

processed_ranks[idx] = ranks[v_offset];
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this thrust::gather?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants