-
Notifications
You must be signed in to change notification settings - Fork 345
Update Betweenness Centrality normalization #4974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
03f142e
56b53f4
4c9b203
52c7962
4592042
2df0021
e41ce24
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -548,27 +548,32 @@ rmm::device_uvector<weight_t> betweenness_centrality( | |
| std::optional<weight_t> scale_factor{std::nullopt}; | ||
|
|
||
| if (normalized) { | ||
| weight_t n = static_cast<weight_t>(graph_view.number_of_vertices()); | ||
| if (!include_endpoints) { n -= weight_t{1}; } | ||
|
|
||
| scale_factor = n * (n - 1); | ||
| } else if (graph_view.is_symmetric()) | ||
| if (include_endpoints) { | ||
| if (graph_view.number_of_vertices() >= 2) { | ||
| scale_factor = static_cast<weight_t>( | ||
| std::min(static_cast<vertex_t>(num_sources), graph_view.number_of_vertices()) * | ||
| (graph_view.number_of_vertices() - 1)); | ||
| } | ||
| } else if (graph_view.number_of_vertices() > 2) { | ||
| scale_factor = static_cast<weight_t>( | ||
| std::min(static_cast<vertex_t>(num_sources), graph_view.number_of_vertices() - 1) * | ||
| (graph_view.number_of_vertices() - 2)); | ||
| } | ||
| } else if (num_sources < static_cast<size_t>(graph_view.number_of_vertices())) { | ||
| scale_factor = (graph_view.is_symmetric() ? weight_t{2} : weight_t{1}) * | ||
| static_cast<weight_t>(num_sources) / | ||
| (include_endpoints ? static_cast<weight_t>(graph_view.number_of_vertices()) | ||
| : static_cast<weight_t>(graph_view.number_of_vertices() - 1)); | ||
|
||
| } else if (graph_view.is_symmetric()) { | ||
| scale_factor = weight_t{2}; | ||
| } | ||
|
|
||
| if (scale_factor) { | ||
| if (graph_view.number_of_vertices() > 2) { | ||
| if (static_cast<vertex_t>(num_sources) < graph_view.number_of_vertices()) { | ||
| (*scale_factor) *= static_cast<weight_t>(num_sources) / | ||
| static_cast<weight_t>(graph_view.number_of_vertices()); | ||
| } | ||
|
|
||
| thrust::transform( | ||
| handle.get_thrust_policy(), | ||
| centralities.begin(), | ||
| centralities.end(), | ||
| centralities.begin(), | ||
| [sf = *scale_factor] __device__(auto centrality) { return centrality / sf; }); | ||
| } | ||
| thrust::transform(handle.get_thrust_policy(), | ||
| centralities.begin(), | ||
| centralities.end(), | ||
| centralities.begin(), | ||
| [sf = *scale_factor] __device__(auto centrality) { return centrality / sf; }); | ||
| } | ||
|
|
||
| return centralities; | ||
|
|
@@ -683,8 +688,9 @@ edge_betweenness_centrality( | |
| if (normalized) { | ||
| weight_t n = static_cast<weight_t>(graph_view.number_of_vertices()); | ||
| scale_factor = n * (n - 1); | ||
| } else if (graph_view.is_symmetric()) | ||
| } else if (graph_view.is_symmetric()) { | ||
| scale_factor = weight_t{2}; | ||
| } | ||
|
|
||
| if (scale_factor) { | ||
| if (graph_view.number_of_vertices() > 1) { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to subtract 1 from
num_sources? (i.e.static_cast<vertex_t>(num_sources - 1)?)I assume
num_sources == graph_view.number_of_vertices()for full BC. It looks a bit weird to subtract 1 just fromgraph_view.number_of_vertices().There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had some complex gyrations around the formulas.
There are a couple of things being accounted for in the scaling factor. In the normalization path, we're trying to divide by the maximum number of times a vertex could appear in the shortest paths. For the full graph, since we're not including endpoints, this is
(n-1) * (n-2)where n is the number of vertices in the graph. This would occur for a vertexvthat has an input edge from every vertex in the graph. Then-1factor counts every vertex other thanv(when we start atvwe won't travel back tovand we're not counting the endpoint). and then-2factor is the maximum number of paths that could travel throughv.For approximate betweenness, we're only traveling through
num_sourcessamples. So the maximum value would benum_sources*n-2. This would occur in any combination of the above described graph where the randomly selected sources did not include the vertexv.I agree it looks odd.