Temporal sampling implementation, still debugging #4994

ChuckHastings · 2025-03-20T04:44:05Z

Temporal sampling implementation. Sampling considers the time stamp of edges, if we arrive at a vertex v with timestamp t1, then when we depart from that vertex to continue sampling we only consider edges that occur after time t1.

PR includes C++ implementation and tests.

At the moment, tests are incomplete, will continue testing. But the PR is big enough I wanted to get eyes on it sooner.

copy-pr-bot · 2025-03-20T04:44:08Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

seunghwak

I thought about efficient implementation of temporal sampling especially considering that some seed vertices can be reached from multiple different paths and we need to apply multiple different temporal windows for the same seed vertex.

This can lead to many vertex partitions especially for power-law graphs.

And applying & creating graph-wise temporal mask can be pretty expensive if we need to do this many times.

We can apply a graph-wise temporal mask to set temporal window including the lower and upper bound of the start/end times for the entire set of seeds in multiple batches.

For a seed specific time window, I think adjusting bias values will lead to more efficient implementation.

We can tag a seed vertex with a time-stamp (https://github.com/rapidsai/cugraph/blob/branch-25.04/cpp/src/prims/per_v_random_select_transform_outgoing_e.cuh#L1092C28-L1092C72).

And when we set the bias value (https://github.com/rapidsai/cugraph/blob/branch-25.04/cpp/src/prims/per_v_random_select_transform_outgoing_e.cuh#L1096), we can set the bias value to 0 if the edge is outside the seed specific time window.

I think this can lead to more efficient implementation than the current approach.

What do you think about this?

seunghwak · 2025-03-21T18:57:45Z

And for uniform sampling, we may use a uniform sampling primitive for seeds that appear no more than once and use a biased sampling primitive for seeds that appear two or more times.

ChuckHastings · 2025-03-21T19:13:21Z

I thought about efficient implementation of temporal sampling especially considering that some seed vertices can be reached from multiple different paths and we need to apply multiple different temporal windows for the same seed vertex.

This can lead to many vertex partitions especially for power-law graphs.

And applying & creating graph-wise temporal mask can be pretty expensive if we need to do this many times.

We can apply a graph-wise temporal mask to set temporal window including the lower and upper bound of the start/end times for the entire set of seeds in multiple batches.

For a seed specific time window, I think adjusting bias values will lead to more efficient implementation.

We can tag a seed vertex with a time-stamp (https://github.com/rapidsai/cugraph/blob/branch-25.04/cpp/src/prims/per_v_random_select_transform_outgoing_e.cuh#L1092C28-L1092C72).

And when we set the bias value (https://github.com/rapidsai/cugraph/blob/branch-25.04/cpp/src/prims/per_v_random_select_transform_outgoing_e.cuh#L1096), we can set the bias value to 0 if the edge is outside the seed specific time window.

I think this can lead to more efficient implementation than the current approach.

What do you think about this?

So something akin to what node2vec does... return a bias of 0 if the edge time is invalid, return a bias of 1 if the edge time is valid. Because we're operating on the tagged vertex, each vertex would have its own timestamp... therefore its own computed bias.

If my interpretation is correct, I think that would be a much simpler implementation and would probably result in significantly better performance in the cases where we end up with a high degree vertex that appears multiple times in the frontier.

seunghwak · 2025-03-21T19:38:29Z

erpretation is correct, I think that would be a much simpler implementation and would probably result in significantly better performance in the cases w

Yes, your interpretation is correct. I agree that this will be simpler & faster. For uniform sampling and to avoid the overhead of evaluating bias for every edge, we can use just a default uniform sampling for seeds that appear only once, and use bias values & tagging for seeds that appear more than once.

…irected graph

Temporal sampling implementation, still debugging

dc3b296

ChuckHastings self-assigned this Mar 20, 2025

github-actions bot added cuGraph CMake labels Mar 20, 2025

ChuckHastings added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed cuGraph CMake labels Mar 20, 2025

ChuckHastings added 2 commits March 21, 2025 07:43

merge latest, resolve conflicts

125c993

Merge branch 'branch-25.04' into temporal_sampling_impl

a8c358d

github-actions bot added cuGraph CMake labels Mar 21, 2025

seunghwak reviewed Mar 21, 2025

View reviewed changes

ChuckHastings added 3 commits March 21, 2025 13:30

update validation routine to take span

2fd8a3a

after discussion, add back the halving of unnormalized results on und…

f0bb2d6

…irected graph

Merge branch 'branch-25.04' into temporal_sampling_impl

c324219

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Temporal sampling implementation, still debugging #4994

Temporal sampling implementation, still debugging #4994

ChuckHastings commented Mar 20, 2025

copy-pr-bot bot commented Mar 20, 2025

seunghwak left a comment

seunghwak commented Mar 21, 2025

ChuckHastings commented Mar 21, 2025

seunghwak commented Mar 21, 2025

Temporal sampling implementation, still debugging #4994

Are you sure you want to change the base?

Temporal sampling implementation, still debugging #4994

Conversation

ChuckHastings commented Mar 20, 2025

copy-pr-bot bot commented Mar 20, 2025

seunghwak left a comment

Choose a reason for hiding this comment

seunghwak commented Mar 21, 2025

ChuckHastings commented Mar 21, 2025

seunghwak commented Mar 21, 2025