-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Betweenness Centrality normalization #4974
Update Betweenness Centrality normalization #4974
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
python/cugraph/cugraph/tests/centrality/test_betweenness_centrality.py
Outdated
Show resolved
Hide resolved
…ality.py Co-authored-by: Erik Welch <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me but I have some questions about the logic to set scale_factor.
} | ||
} else if (graph_view.number_of_vertices() > 2) { | ||
scale_factor = static_cast<weight_t>( | ||
std::min(static_cast<vertex_t>(num_sources), graph_view.number_of_vertices() - 1) * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to subtract 1 from num_sources
? (i.e. static_cast<vertex_t>(num_sources - 1)
?)
I assume num_sources == graph_view.number_of_vertices()
for full BC. It looks a bit weird to subtract 1 just from graph_view.number_of_vertices()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had some complex gyrations around the formulas.
There are a couple of things being accounted for in the scaling factor. In the normalization path, we're trying to divide by the maximum number of times a vertex could appear in the shortest paths. For the full graph, since we're not including endpoints, this is (n-1) * (n-2)
where n is the number of vertices in the graph. This would occur for a vertex v
that has an input edge from every vertex in the graph. The n-1
factor counts every vertex other than v
(when we start at v
we won't travel back to v
and we're not counting the endpoint). and the n-2
factor is the maximum number of paths that could travel through v
.
For approximate betweenness, we're only traveling through num_sources
samples. So the maximum value would be num_sources
* n-2
. This would occur in any combination of the above described graph where the randomly selected sources did not include the vertex v
.
I agree it looks odd.
scale_factor = (graph_view.is_symmetric() ? weight_t{2} : weight_t{1}) * | ||
static_cast<weight_t>(num_sources) / | ||
(include_endpoints ? static_cast<weight_t>(graph_view.number_of_vertices()) | ||
: static_cast<weight_t>(graph_view.number_of_vertices() - 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't check vertices.size()
(or sum of vertices
.size() in multi-GPU) > 0. So, it is technically possible to pass empty seed vertices, and in this case, num_sources = 0 && graph_view.number_of_verties() = 1 is possible; then, scale_factor can become 0 leading to divide by 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. That check exists in the above if statements and not this one. I will add that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushed an update
/merge |
Betweenness Centrality normalization is not quite right if you specify not including endpoints and use approximate betweenness.
This PR temporarily disables some of the python tests that compare results with networkx, since the networkx to update the normalization scores is not yet merged. Once networkx/networkx#7908 is merged we should be able to create another PR to enable those tests. Each of the disabled tests is skipped with a link to that PR as the reason.
Closes #4941