Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Betweenness Centrality normalization #4974

Merged

Conversation

ChuckHastings
Copy link
Collaborator

@ChuckHastings ChuckHastings commented Mar 14, 2025

Betweenness Centrality normalization is not quite right if you specify not including endpoints and use approximate betweenness.

This PR temporarily disables some of the python tests that compare results with networkx, since the networkx to update the normalization scores is not yet merged. Once networkx/networkx#7908 is merged we should be able to create another PR to enable those tests. Each of the disabled tests is skipped with a link to that PR as the reason.

Closes #4941

Copy link

copy-pr-bot bot commented Mar 14, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@ChuckHastings ChuckHastings self-assigned this Mar 14, 2025
@ChuckHastings ChuckHastings added bug Something isn't working non-breaking Non-breaking change labels Mar 14, 2025
@ChuckHastings ChuckHastings marked this pull request as ready for review March 14, 2025 22:35
@ChuckHastings ChuckHastings requested a review from a team as a code owner March 14, 2025 22:35
@ChuckHastings ChuckHastings requested a review from a team as a code owner March 18, 2025 20:00
Copy link
Contributor

@seunghwak seunghwak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me but I have some questions about the logic to set scale_factor.

}
} else if (graph_view.number_of_vertices() > 2) {
scale_factor = static_cast<weight_t>(
std::min(static_cast<vertex_t>(num_sources), graph_view.number_of_vertices() - 1) *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to subtract 1 from num_sources? (i.e. static_cast<vertex_t>(num_sources - 1)?)

I assume num_sources == graph_view.number_of_vertices() for full BC. It looks a bit weird to subtract 1 just from graph_view.number_of_vertices().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had some complex gyrations around the formulas.

There are a couple of things being accounted for in the scaling factor. In the normalization path, we're trying to divide by the maximum number of times a vertex could appear in the shortest paths. For the full graph, since we're not including endpoints, this is (n-1) * (n-2) where n is the number of vertices in the graph. This would occur for a vertex v that has an input edge from every vertex in the graph. The n-1 factor counts every vertex other than v (when we start at v we won't travel back to v and we're not counting the endpoint). and the n-2 factor is the maximum number of paths that could travel through v.

For approximate betweenness, we're only traveling through num_sources samples. So the maximum value would be num_sources * n-2. This would occur in any combination of the above described graph where the randomly selected sources did not include the vertex v.

I agree it looks odd.

scale_factor = (graph_view.is_symmetric() ? weight_t{2} : weight_t{1}) *
static_cast<weight_t>(num_sources) /
(include_endpoints ? static_cast<weight_t>(graph_view.number_of_vertices())
: static_cast<weight_t>(graph_view.number_of_vertices() - 1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't check vertices.size() (or sum of vertices.size() in multi-GPU) > 0. So, it is technically possible to pass empty seed vertices, and in this case, num_sources = 0 && graph_view.number_of_verties() = 1 is possible; then, scale_factor can become 0 leading to divide by 0.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. That check exists in the above if statements and not this one. I will add that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed an update

@ChuckHastings
Copy link
Collaborator Author

/merge

@rapids-bot rapids-bot bot merged commit 6ef7d0b into rapidsai:branch-25.04 Mar 20, 2025
82 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuGraph non-breaking Non-breaking change python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: normalization issue in betweenness_centrality
4 participants