-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Temporal sampling implementation, still debugging #4994
base: branch-25.04
Are you sure you want to change the base?
Temporal sampling implementation, still debugging #4994
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about efficient implementation of temporal sampling especially considering that some seed vertices can be reached from multiple different paths and we need to apply multiple different temporal windows for the same seed vertex.
This can lead to many vertex partitions especially for power-law graphs.
And applying & creating graph-wise temporal mask can be pretty expensive if we need to do this many times.
We can apply a graph-wise temporal mask to set temporal window including the lower and upper bound of the start/end times for the entire set of seeds in multiple batches.
For a seed specific time window, I think adjusting bias values will lead to more efficient implementation.
We can tag a seed vertex with a time-stamp (https://github.com/rapidsai/cugraph/blob/branch-25.04/cpp/src/prims/per_v_random_select_transform_outgoing_e.cuh#L1092C28-L1092C72).
And when we set the bias value (https://github.com/rapidsai/cugraph/blob/branch-25.04/cpp/src/prims/per_v_random_select_transform_outgoing_e.cuh#L1096), we can set the bias value to 0 if the edge is outside the seed specific time window.
I think this can lead to more efficient implementation than the current approach.
What do you think about this?
And for uniform sampling, we may use a uniform sampling primitive for seeds that appear no more than once and use a biased sampling primitive for seeds that appear two or more times. |
So something akin to what node2vec does... return a bias of 0 if the edge time is invalid, return a bias of 1 if the edge time is valid. Because we're operating on the tagged vertex, each vertex would have its own timestamp... therefore its own computed bias. If my interpretation is correct, I think that would be a much simpler implementation and would probably result in significantly better performance in the cases where we end up with a high degree vertex that appears multiple times in the frontier. |
Yes, your interpretation is correct. I agree that this will be simpler & faster. For uniform sampling and to avoid the overhead of evaluating bias for every edge, we can use just a default uniform sampling for seeds that appear only once, and use bias values & tagging for seeds that appear more than once. |
Temporal sampling implementation. Sampling considers the time stamp of edges, if we arrive at a vertex
v
with timestampt1
, then when we depart from that vertex to continue sampling we only consider edges that occur after timet1
.PR includes C++ implementation and tests.
At the moment, tests are incomplete, will continue testing. But the PR is big enough I wanted to get eyes on it sooner.