Update Betweenness Centrality normalization#4974
Update Betweenness Centrality normalization#4974rapids-bot[bot] merged 7 commits intorapidsai:branch-25.04from
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
python/cugraph/cugraph/tests/centrality/test_betweenness_centrality.py
Outdated
Show resolved
Hide resolved
…ality.py Co-authored-by: Erik Welch <erik.n.welch@gmail.com>
seunghwak
left a comment
There was a problem hiding this comment.
Looks good to me but I have some questions about the logic to set scale_factor.
| } | ||
| } else if (graph_view.number_of_vertices() > 2) { | ||
| scale_factor = static_cast<weight_t>( | ||
| std::min(static_cast<vertex_t>(num_sources), graph_view.number_of_vertices() - 1) * |
There was a problem hiding this comment.
No need to subtract 1 from num_sources? (i.e. static_cast<vertex_t>(num_sources - 1)?)
I assume num_sources == graph_view.number_of_vertices() for full BC. It looks a bit weird to subtract 1 just from graph_view.number_of_vertices().
There was a problem hiding this comment.
We had some complex gyrations around the formulas.
There are a couple of things being accounted for in the scaling factor. In the normalization path, we're trying to divide by the maximum number of times a vertex could appear in the shortest paths. For the full graph, since we're not including endpoints, this is (n-1) * (n-2) where n is the number of vertices in the graph. This would occur for a vertex v that has an input edge from every vertex in the graph. The n-1 factor counts every vertex other than v (when we start at v we won't travel back to v and we're not counting the endpoint). and the n-2 factor is the maximum number of paths that could travel through v.
For approximate betweenness, we're only traveling through num_sources samples. So the maximum value would be num_sources * n-2. This would occur in any combination of the above described graph where the randomly selected sources did not include the vertex v.
I agree it looks odd.
| scale_factor = (graph_view.is_symmetric() ? weight_t{2} : weight_t{1}) * | ||
| static_cast<weight_t>(num_sources) / | ||
| (include_endpoints ? static_cast<weight_t>(graph_view.number_of_vertices()) | ||
| : static_cast<weight_t>(graph_view.number_of_vertices() - 1)); |
There was a problem hiding this comment.
We don't check vertices.size() (or sum of vertices.size() in multi-GPU) > 0. So, it is technically possible to pass empty seed vertices, and in this case, num_sources = 0 && graph_view.number_of_verties() = 1 is possible; then, scale_factor can become 0 leading to divide by 0.
There was a problem hiding this comment.
Good catch. That check exists in the above if statements and not this one. I will add that.
There was a problem hiding this comment.
Just pushed an update
|
/merge |
networkx/networkx#7949 implements an update to normalization that was originally described in #4941. We pushed an update in #4974 that addressed the specific examples that the user identified in the original cugraph issue. However, while @eriknw was exploring updates to networkx to address this, he identified a few more edge conditions that needed to be satisfied. This PR addresses those remaining edge conditions. Note that the python tests comparing results to networkx are still disabled. These can't be re-enabled until networkx/networkx#7949 is included in a networkx release. Closes #5006 Closes #5107 Authors: - Chuck Hastings (https://github.com/ChuckHastings) - Rick Ratzel (https://github.com/rlratzel) Approvers: - Seunghwa Kang (https://github.com/seunghwak) - Joseph Nke (https://github.com/jnke2016) - Rick Ratzel (https://github.com/rlratzel) URL: #5105
Betweenness Centrality normalization is not quite right if you specify not including endpoints and use approximate betweenness.
This PR temporarily disables some of the python tests that compare results with networkx, since the networkx to update the normalization scores is not yet merged. Once networkx/networkx#7908 is merged we should be able to create another PR to enable those tests. Each of the disabled tests is skipped with a link to that PR as the reason.
Closes #4941