Skip to content

refactor: unify CuPy/buffer-protocol path for array and sparse Csr#104

Open
rho-novatron wants to merge 1 commit into
Helmholtz-AI-Energy:mainfrom
rho-novatron:refactor/array-helper
Open

refactor: unify CuPy/buffer-protocol path for array and sparse Csr#104
rho-novatron wants to merge 1 commit into
Helmholtz-AI-Energy:mainfrom
rho-novatron:refactor/array-helper

Conversation

@rho-novatron
Copy link
Copy Markdown
Contributor

PR-A: refactor: unify CuPy/buffer-protocol path for array and sparse Csr

Branch (fork): rho-novatron:refactor/array-helper
Base (upstream): Helmholtz-AI-Energy:main
Commits: 1 (17b586d)


Summary

Pulls the __cuda_array_interface__ / Python-buffer-protocol input
handling out of array.cpp and matrix.cpp and into a single helper
in utils.hpp:

template <typename T>
gko::array<T> gko_array_from_pyobject(
    std::shared_ptr<const gko::Executor> exec,
    py::object obj);

Both the gko::array(Executor, py::object) constructor and the
sparse Csr(Executor, scipy/cupyx csr_matrix) constructor now go
through it.

Why this isn't just a refactor

The pre-existing sparse-matrix constructor took two structurally
different paths depending on input type:

  • CuPy CSR on a CudaExecutor — zero-copy via
    __cuda_array_interface__ (added in Add CuPy zero-copy interoperability via __cuda_array_interface__ #98).
  • Anything else (NumPy / scipy / buffer protocol) — a manual
    fallback that built gko::array<...>::view(...) on a
    ReferenceExecutor and handed those host arrays to
    Csr::create(exec, ...), forcing a second host→device copy when
    the executor was CUDA.

After this PR, both paths flow through gko_array_from_pyobject,
which:

  • honors the actual target exec for the buffer-protocol path
    (constructs the array directly on the requested executor instead
    of via a Reference detour),
  • applies the dtype typestr validation that previously only the
    CUDA branch performed, to the host path as well, and
  • gives array and Csr a single source of truth, so future fixes
    (e.g. the cross-platform integer-dtype-char fix in the
    follow-up MPI PR) automatically apply to both.

Behavior change for sparse matrices on CUDA built from
NumPy/scipy input: now allocated and copied directly on the target
device executor instead of through a host ReferenceExecutor
intermediate.

Diff

 src/cpp_bindings/array.cpp  | 70 +-----
 src/cpp_bindings/matrix.cpp | 84 +---------
 src/cpp_bindings/utils.hpp  | 66 +++++++
 3 files changed, 75 insertions(+), 147 deletions(-)

Why this is its own PR

  • Stands alone as a behavior-improving cleanup of Add CuPy zero-copy interoperability via __cuda_array_interface__ #98's wake.
  • Reviewers can sign off on it independent of the much larger MPI
    feature work that follows in the stacked PR-B.
  • PR-B's Windows-portability fix is a one-line change against this
    helper — much easier to review when the helper itself has already
    been reviewed in isolation.

Testing

The existing test suite covers all the input shapes: NumPy/scipy CSR
on ReferenceExecutor, CuPy CSR zero-copy on CudaExecutor, and
the dense-array constructors. No regressions observed locally.

Stacked PR

Not stacked on anything (based on main). PR-B for the distributed
MPI bindings is stacked on top of this.

…nstructors

Move the duplicated __cuda_array_interface__ / buffer-protocol conversion
logic from array.cpp and matrix.cpp into a single gko_array_from_pyobject<T>()
template in utils.hpp.

Both the gko::array(Executor, py::object) constructor and the sparse-matrix
(Executor, py::object) constructor now delegate to this helper, eliminating
~260 lines of duplicated CUDA/host branching.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant