Skip to content

Conversation

@antonio-decaro
Copy link

@antonio-decaro antonio-decaro commented Jul 9, 2025

Description

This PR introduces support for integrating SYgraph, an heterogeneous graph analytics framework, into oneDAL.

In graph processing, a frontier represents the active subset of vertices currently being processed during an iteration of a graph algorithm (e.g. Breadth First Search, Single Source Shortest Path).
To operate over the frontier efficiently, SYgraph introduces a set of composable, GPU-parallel graph operators, each designed to process the active vertex set in a scalable, data-driven manner. These operators are invoked over frontiers and graphs using user-defined lambda functions and include:

  • Advance (edge operation): Traverses edges from the current frontier to generate a new set of active vertices (the next frontier). For each vertex in the input frontier, it inspects neighbors and uses a lambda to decide which should be added to the output.
  • Compute (vertex operation): Applies a computation to each vertex in the frontier — typically to update vertex properties.
  • Filter: Refines a frontier by selecting only the elements that satisfy a given condition.

This PR aims to implement the concept of Two-Layer Bitmap Frontier (see the SYgraph paper for more details), primitives operators (advance, filter, compute), and a load-balancing mechanism to evenly distribute computation across all GPU compute units.

Detailed Overview

Changes are contained into the cpp/onedal/dal/backend/primitives/frontier folder.

  • bitset.hpp
    Templated bitset implemented as an array of integer words where each bit encodes a vertex state (1 = active, 0 = inactive). Serves as the low-level building block for the frontier.
  • frontier.hpp/frontier_dpc.cpp
    Two-level bitmap frontier representing vertex activity for graph algorithms. Exposes core operations:
    • insert a vertex into the frontier
    • test/contains a vertex
    • check whether the frontier is empty
    • precompute an offsets buffer to improve workload distribution
      Implementation details and device-specific DPC++ code live in frontier_dpc.cpp to keep headers lightweight.
  • advance.hpp
    Templated advance primitive that accepts a callable (e.g., a lambda) executed for every newly discovered vertex during an advance step. Includes a workload-balancing strategy to improve resource utilization during the advance.
  • graph.hpp
    A small graph interface used by SYgraph: a non-owning CSR view that enables generic graph operations without taking ownership of the underlying memory.

Testing

  • See test/ for unit tests covering basic operations and advance workflows.
  • Tests focus on correctness of bit manipulations, frontier semantics (insert/test/empty), offsets computation, and the advance callable invocation for newly discovered vertices.

Notes for reviewers

  • The design is template-first to remain flexible across integer/index types and host/device execution;
  • The advance.hpp must remain a header-only implementation and cannot be split into a header + precompiled implementation file. It accepts user-provided callables (typically lambdas) that must be instantiated at each call site at compile time; these callables (and the device kernels that use them) cannot be precompiled into a separate translation unit. Keeping advance.hpp header-only ensures the lambda is correctly compiled/instantiated for both host and device code.

PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.

You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).

Checklist to comply with before moving PR from draft:

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
    The failures in the internal CI are unrelated to these changes.
    The status of the added tests:
image
  • I have extended testing suite if new functionality was introduced in this PR.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
  • I have provided justification why performance has changed or why changes are not expected.
  • I have provided justification why quality metrics have changed or why changes are not expected.
  • I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

…consistency; update related functionality; fixed bug for SIGSEGV
- Removed the existing frontier_dpc.hpp file to streamline the codebase.
- Introduced new test files for advance operation, BFS, and basic frontier operations.
- Implemented comprehensive tests to validate the functionality of the frontier data structure.
- Enhanced the frontier class with additional methods for better performance and usability.
- Ensured compatibility with SYCL and improved device memory management.
void compare_frontiers(T& device_frontier, std::vector<uint32_t>& host_frontier, size_t num_nodes) {
for (size_t i = 0; i < num_nodes; ++i) {
bool tmpd = device_frontier.check(i);
bool tmph = std::find(host_frontier.begin(), host_frontier.end(), i) != host_frontier.end();
Copy link
Contributor

@avolkov-intel avolkov-intel Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

host_frontier should be sorted by construction, so instead of std::find we can use a second pointer that will be pointing to the current node, thus we can reduce complexity from O(n^2) to O(n), this can save some time for testing in case we want check frontiers for big sizes

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used a boolean map to represent whether each node is inside the frontier, in this way the complexity of the compare_frontier method changed to O(n).

@avolkov-intel avolkov-intel added new algorithm New algorithm or method in oneDAL graph labels Sep 18, 2025
@avolkov-intel avolkov-intel changed the title SYgraph Implementation of frontier primitive from SYGraph Sep 18, 2025
@Vika-F
Copy link
Contributor

Vika-F commented Oct 2, 2025

/intelci: run

@Vika-F Vika-F requested a review from Copilot October 10, 2025 11:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces support for SYgraph, a heterogeneous graph analytics framework, by implementing frontier-based graph processing primitives in oneDAL. The implementation provides GPU-parallel graph operators for processing active vertex sets using composable operations.

Key changes include:

  • Implementation of Two-Layer Bitmap Frontier for tracking active vertices in graph algorithms
  • Core primitives: advance (edge traversal), compute (vertex operations), and filter operations
  • Load-balancing mechanism for efficient GPU compute unit utilization

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
cpp/oneapi/dal/backend/primitives/frontier/bitset.hpp Templated bitset implementation using integer word arrays for vertex state encoding
cpp/oneapi/dal/backend/primitives/frontier/frontier.hpp Two-level bitmap frontier interface with core operations (insert, test, empty, offsets)
cpp/oneapi/dal/backend/primitives/frontier/frontier_dpc.cpp Device-specific DPC++ implementation of frontier operations
cpp/oneapi/dal/backend/primitives/frontier/advance.hpp Templated advance primitive with workload balancing for edge traversal
cpp/oneapi/dal/backend/primitives/frontier/graph.hpp Non-owning CSR graph view interface for generic graph operations
cpp/oneapi/dal/backend/primitives/frontier/test/*.cpp Unit tests for frontier operations, advance workflows, and BFS implementation
cpp/oneapi/dal/backend/primitives/frontier/test/utils.hpp Test utilities for random graph generation and device information
cpp/oneapi/dal/backend/common.hpp Added device_max_sg_count function for subgroup query support
Build and module configuration files Integration of frontier module into build system

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

auto offsets_pointer = _offsets.get_mutable_data() + 1;

const uint32_t element_bitsize = bitmap.get_element_bitsize();
const size_t local_range = 256; // propose_wg_size(this->_queue);
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hard-coded value 256 should be replaced with the commented function call propose_wg_size(this->_queue) or made configurable to avoid magic numbers.

Suggested change
const size_t local_range = 256; // propose_wg_size(this->_queue);
const size_t local_range = propose_wg_size(this->_queue);

Copilot uses AI. Check for mistakes.

private:
sycl::queue& _queue;
size_t _num_items;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The common convention is to use std::inte64_t type for data sizes in oneDAL.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still fine if I use std::uint64_t for size_t types instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dpc++ Issue/PR related to DPC++ functionality graph new algorithm New algorithm or method in oneDAL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants