-
Notifications
You must be signed in to change notification settings - Fork 225
Implementation of frontier primitive from SYGraph #3289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…move obsolete test header
…eck and clear operations
…consistency; update related functionality; fixed bug for SIGSEGV
…name printing and frontier checks
- Removed the existing frontier_dpc.hpp file to streamline the codebase. - Introduced new test files for advance operation, BFS, and basic frontier operations. - Implemented comprehensive tests to validate the functionality of the frontier data structure. - Enhanced the frontier class with additional methods for better performance and usability. - Ensured compatibility with SYCL and improved device memory management.
…ize global size calculation
…al and add explanatory comment)
…headers, sources and tests
| void compare_frontiers(T& device_frontier, std::vector<uint32_t>& host_frontier, size_t num_nodes) { | ||
| for (size_t i = 0; i < num_nodes; ++i) { | ||
| bool tmpd = device_frontier.check(i); | ||
| bool tmph = std::find(host_frontier.begin(), host_frontier.end(), i) != host_frontier.end(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
host_frontier should be sorted by construction, so instead of std::find we can use a second pointer that will be pointing to the current node, thus we can reduce complexity from O(n^2) to O(n), this can save some time for testing in case we want check frontiers for big sizes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used a boolean map to represent whether each node is inside the frontier, in this way the complexity of the compare_frontier method changed to O(n).
…ate and introduce hierarchical reductions
…ble declarations in BitmapKernel and frontier_dpc implementations
…sentation and update test case names for clarity
…t32_t> for improved performance and memory efficiency
…ne helper signatures
|
/intelci: run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces support for SYgraph, a heterogeneous graph analytics framework, by implementing frontier-based graph processing primitives in oneDAL. The implementation provides GPU-parallel graph operators for processing active vertex sets using composable operations.
Key changes include:
- Implementation of Two-Layer Bitmap Frontier for tracking active vertices in graph algorithms
- Core primitives: advance (edge traversal), compute (vertex operations), and filter operations
- Load-balancing mechanism for efficient GPU compute unit utilization
Reviewed Changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
cpp/oneapi/dal/backend/primitives/frontier/bitset.hpp |
Templated bitset implementation using integer word arrays for vertex state encoding |
cpp/oneapi/dal/backend/primitives/frontier/frontier.hpp |
Two-level bitmap frontier interface with core operations (insert, test, empty, offsets) |
cpp/oneapi/dal/backend/primitives/frontier/frontier_dpc.cpp |
Device-specific DPC++ implementation of frontier operations |
cpp/oneapi/dal/backend/primitives/frontier/advance.hpp |
Templated advance primitive with workload balancing for edge traversal |
cpp/oneapi/dal/backend/primitives/frontier/graph.hpp |
Non-owning CSR graph view interface for generic graph operations |
cpp/oneapi/dal/backend/primitives/frontier/test/*.cpp |
Unit tests for frontier operations, advance workflows, and BFS implementation |
cpp/oneapi/dal/backend/primitives/frontier/test/utils.hpp |
Test utilities for random graph generation and device information |
cpp/oneapi/dal/backend/common.hpp |
Added device_max_sg_count function for subgroup query support |
| Build and module configuration files | Integration of frontier module into build system |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| auto offsets_pointer = _offsets.get_mutable_data() + 1; | ||
|
|
||
| const uint32_t element_bitsize = bitmap.get_element_bitsize(); | ||
| const size_t local_range = 256; // propose_wg_size(this->_queue); |
Copilot
AI
Oct 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hard-coded value 256 should be replaced with the commented function call propose_wg_size(this->_queue) or made configurable to avoid magic numbers.
| const size_t local_range = 256; // propose_wg_size(this->_queue); | |
| const size_t local_range = propose_wg_size(this->_queue); |
cpp/oneapi/dal/backend/primitives/frontier/test/advance_dpc.cpp
Outdated
Show resolved
Hide resolved
|
|
||
| private: | ||
| sycl::queue& _queue; | ||
| size_t _num_items; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The common convention is to use std::inte64_t type for data sizes in oneDAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it still fine if I use std::uint64_t for size_t types instead?
…y frontier_context API
Description
This PR introduces support for integrating SYgraph, an heterogeneous graph analytics framework, into oneDAL.
In graph processing, a frontier represents the active subset of vertices currently being processed during an iteration of a graph algorithm (e.g. Breadth First Search, Single Source Shortest Path).
To operate over the frontier efficiently, SYgraph introduces a set of composable, GPU-parallel graph operators, each designed to process the active vertex set in a scalable, data-driven manner. These operators are invoked over frontiers and graphs using user-defined lambda functions and include:
This PR aims to implement the concept of Two-Layer Bitmap Frontier (see the SYgraph paper for more details), primitives operators (advance, filter, compute), and a load-balancing mechanism to evenly distribute computation across all GPU compute units.
Detailed Overview
Changes are contained into the
cpp/onedal/dal/backend/primitives/frontierfolder.bitset.hppTemplated bitset implemented as an array of integer words where each bit encodes a vertex state (1 = active, 0 = inactive). Serves as the low-level building block for the frontier.
frontier.hpp/frontier_dpc.cppTwo-level bitmap frontier representing vertex activity for graph algorithms. Exposes core operations:
Implementation details and device-specific DPC++ code live in frontier_dpc.cpp to keep headers lightweight.
advance.hppTemplated advance primitive that accepts a callable (e.g., a lambda) executed for every newly discovered vertex during an advance step. Includes a workload-balancing strategy to improve resource utilization during the advance.
graph.hppA small graph interface used by SYgraph: a non-owning CSR view that enables generic graph operations without taking ownership of the underlying memory.
Testing
test/for unit tests covering basic operations and advance workflows.Notes for reviewers
advance.hppmust remain a header-only implementation and cannot be split into a header + precompiled implementation file. It accepts user-provided callables (typically lambdas) that must be instantiated at each call site at compile time; these callables (and the device kernels that use them) cannot be precompiled into a separate translation unit. Keeping advance.hpp header-only ensures the lambda is correctly compiled/instantiated for both host and device code.PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.
You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).
Checklist to comply with before moving PR from draft:
PR completeness and readability
Testing
The failures in the internal CI are unrelated to these changes.
The status of the added tests:
Performance