Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporal sampling implementation, still debugging #4994

Draft
wants to merge 6 commits into
base: branch-25.04
Choose a base branch
from

Conversation

ChuckHastings
Copy link
Collaborator

Temporal sampling implementation. Sampling considers the time stamp of edges, if we arrive at a vertex v with timestamp t1, then when we depart from that vertex to continue sampling we only consider edges that occur after time t1.

PR includes C++ implementation and tests.

At the moment, tests are incomplete, will continue testing. But the PR is big enough I wanted to get eyes on it sooner.

Copy link

copy-pr-bot bot commented Mar 20, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@ChuckHastings ChuckHastings self-assigned this Mar 20, 2025
@ChuckHastings ChuckHastings added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed cuGraph CMake labels Mar 20, 2025
Copy link
Contributor

@seunghwak seunghwak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about efficient implementation of temporal sampling especially considering that some seed vertices can be reached from multiple different paths and we need to apply multiple different temporal windows for the same seed vertex.

This can lead to many vertex partitions especially for power-law graphs.

And applying & creating graph-wise temporal mask can be pretty expensive if we need to do this many times.

We can apply a graph-wise temporal mask to set temporal window including the lower and upper bound of the start/end times for the entire set of seeds in multiple batches.

For a seed specific time window, I think adjusting bias values will lead to more efficient implementation.

We can tag a seed vertex with a time-stamp (https://github.com/rapidsai/cugraph/blob/branch-25.04/cpp/src/prims/per_v_random_select_transform_outgoing_e.cuh#L1092C28-L1092C72).

And when we set the bias value (https://github.com/rapidsai/cugraph/blob/branch-25.04/cpp/src/prims/per_v_random_select_transform_outgoing_e.cuh#L1096), we can set the bias value to 0 if the edge is outside the seed specific time window.

I think this can lead to more efficient implementation than the current approach.

What do you think about this?

@seunghwak
Copy link
Contributor

And for uniform sampling, we may use a uniform sampling primitive for seeds that appear no more than once and use a biased sampling primitive for seeds that appear two or more times.

@ChuckHastings
Copy link
Collaborator Author

I thought about efficient implementation of temporal sampling especially considering that some seed vertices can be reached from multiple different paths and we need to apply multiple different temporal windows for the same seed vertex.

This can lead to many vertex partitions especially for power-law graphs.

And applying & creating graph-wise temporal mask can be pretty expensive if we need to do this many times.

We can apply a graph-wise temporal mask to set temporal window including the lower and upper bound of the start/end times for the entire set of seeds in multiple batches.

For a seed specific time window, I think adjusting bias values will lead to more efficient implementation.

We can tag a seed vertex with a time-stamp (https://github.com/rapidsai/cugraph/blob/branch-25.04/cpp/src/prims/per_v_random_select_transform_outgoing_e.cuh#L1092C28-L1092C72).

And when we set the bias value (https://github.com/rapidsai/cugraph/blob/branch-25.04/cpp/src/prims/per_v_random_select_transform_outgoing_e.cuh#L1096), we can set the bias value to 0 if the edge is outside the seed specific time window.

I think this can lead to more efficient implementation than the current approach.

What do you think about this?

So something akin to what node2vec does... return a bias of 0 if the edge time is invalid, return a bias of 1 if the edge time is valid. Because we're operating on the tagged vertex, each vertex would have its own timestamp... therefore its own computed bias.

If my interpretation is correct, I think that would be a much simpler implementation and would probably result in significantly better performance in the cases where we end up with a high degree vertex that appears multiple times in the frontier.

@seunghwak
Copy link
Contributor

erpretation is correct, I think that would be a much simpler implementation and would probably result in significantly better performance in the cases w

Yes, your interpretation is correct. I agree that this will be simpler & faster. For uniform sampling and to avoid the overhead of evaluating bias for every edge, we can use just a default uniform sampling for seeds that appear only once, and use bias values & tagging for seeds that appear more than once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cuGraph improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants