[enhancement] add dlpack support to `to_table` #2275

icfaust · 2025-01-27T09:50:39Z

Description

This PR introduces __dlpack__ tensor (https://github.com/dmlc/dlpack) consumption by to_table allowing for zero-copy use of data in oneDAL. This is important for enabling array_api support and is a pre-requisite for #2096 (array api dispatching). That PR is then a pre-requisite for #2100 #2106 #2189 #2206 #2207 and #2209. Sklearn provides array_api support for some algorithms. If we wish to fully support zero copy of sycl_usm inputs, we need to be able to consume array_api inputs due to underlying sklearn dependencies (validate_data, check_array, etc.). While we support Sycl usm ndarrays (dpctl, dpnp) via the __sycl_usm_array_interface__ method in the onedal folder estimators, to properly interface estimators in the sklearnex folder, we need to support the __dlpack__ method of arrays/tensors. This PR does that and greatly simplifies the necessary logic in #2096 and the follow-up PRs. This PR also provides the added benefit of working with other frameworks which support SYCL gpu data which have __dlpack__ interfaces (i.e. PyTorch).

NOTES:

This PR continues from [enhancement] Refactor onedal/datatypes in preparation for dlpack support #2195 to integrate dlpack support, and it takes code from DataManagement update #1568. Please use DataManagement update #1568 as reference, though it contained some mistakes.
This new functionality is not yet exposed publicly and therefore must be added to the documentation with ENH: Array API dispatching #2096.
This aspect is not yet benchmarkable/ has nothing to benchmark against.
Memory leak checking using the infrastructure from ENH: Data management update to support SUA ifaces for Homogen OneDAL tables #2045 (only CPU at the moment, will be modified if/when PyTorch support is brought online)
It does numeric testing via onedal's assert_all_finite testing using array_api_strict arrays, as that has a simple interface to the backend with no checking of array aspects on the python-side before to_table.
A special testing class is created to verify for SYCL device support and is used in test_data.py.
This does not have any work with returning __dlpack__ supported arrays as results. And must be done as a follow-up PR (probably also a good idea in order to ease reviewing). Therefore, this is only about array/tensor consumption.

TODO: add a onedal function which checks a dlpack tensor for C-contiguity or F-contiguity similar to the flags attribute of numpy/dpctl/dpnp. This is out of the scope of this PR, but is necessary for assert_all_finite support for the next step in array_api work.

PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.

You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).

Checklist to comply with before moving PR from draft:

PR completeness and readability

I have reviewed my changes thoroughly before submitting this pull request.
I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have added a respective label(s) to PR if I have a permission for that.
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.
I have extended testing suite if new functionality was introduced in this PR.

Performance

I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
I have provided justification why performance has changed or why changes are not expected.
I have provided justification why quality metrics have changed or why changes are not expected.
I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

…tion

icfaust · 2025-02-20T23:49:52Z

/intelci: run

icfaust · 2025-03-10T07:46:27Z

/intelci: run

icfaust · 2025-03-14T08:53:34Z

/intelci: run

icfaust · 2025-03-14T13:29:53Z

/intelci: run

onedal/datatypes/tests/test_data.py

sklearnex/tests/utils/base.py

icfaust · 2025-03-16T20:45:23Z

/intelci: run

icfaust · 2025-03-16T22:46:35Z

/intelci: run

david-cortes-intel · 2025-03-17T14:40:57Z

onedal/datatypes/dlpack/data_conversion.cpp

+            res = convert_to_table(copy, q_obj, true);
+        }
+        else {
+            throw std::invalid_argument("dlpack input could not be converted into onedal table.");


Very minor thing here but: why not py::type_error like in the others?

david-cortes-intel · 2025-03-17T14:44:59Z

onedal/datatypes/dlpack/dlpack_utils.cpp

+        copy = space.attr("asarray")(obj, "copy"_a = true);
+    }
+    else {
+        throw std::runtime_error("Wrong strides");


Shouldn't the error mention something about the arrai API namespace instead of strides?

This is for conformance to raised errors in dpctl and dpnp. I'm not saying its right, but its consistent.

If I'm getting the logic correctly, this error could potentially be raised for inputs that are from packages unrelated to dpctl / dpnp, like PyTensor.

onedal/datatypes/dlpack/dlpack_utils.cpp

david-cortes-intel · 2025-03-17T15:01:00Z

@icfaust I'm again thinking that it might be better to have the dlpack header as a git submodule:
https://github.com/dmlc/dlpack/blob/main/include/dlpack/dlpack.h

It has already undergone changes since the time that you copied it here.

icfaust · 2025-03-17T21:20:04Z

@icfaust I'm again thinking that it might be better to have the dlpack header as a git submodule: https://github.com/dmlc/dlpack/blob/main/include/dlpack/dlpack.h

It has already undergone changes since the time that you copied it here.

https://github.com/dmlc/dlpack/blob/main/include/dlpack/dlpack.h#L57-L59 None of the changes are relevant to our current implementation, as the supported types and devices usable in scikit-learn-intelex/ oneDAL are strict and defined. I believe my previous answer showing that other more popular frameworks, including dependencies of scikit-learn-intelex (i.e. numpy), follow the currently implemented strategy. If you insist we can have a maintainer step in for an opinion and if they agree I will make the change, otherwise I would like to keep it as it is.

icfaust · 2025-03-18T00:17:17Z

/intelci: run

…_data``` and ```_check_sample_weight``` (#2296) * update to #2275 * add dlpack contiguous check * clang-format * Update validation.py * Update data_conversion.cpp * Update data_conversion.cpp * Update validation.py * Update data_conversion.cpp * Update data_conversion.hpp * reduce duplicated code * fix mistake

icfaust and others added 30 commits January 27, 2025 10:48

start from uxlfoundation#2195

939141e

add apache license

ae038cc

add files from uxlfoundation#1568

df87e74

start converting over

a6d364b

attempts to fix copyright checker

eabbbed

remove table_metadata

bf459de

Merge branch 'uxlfoundation:main' into dev/dlpack_integration

941a8ad

merge

ce6de53

weird merge

4caedc7

renaming

009a19c

change location

ccc3534

will implement these elsewhere

3a6c3db

move files to follow naming

7c36c57

change headers further

79496da

Merge remote-tracking branch 'sklearnex/main' into dev/dlpack_integra…

311e4bf

…tion

Merge branch 'uxlfoundation:main' into dev/dlpack_integration

09b6c7c

interim standpoint which will fail

5f92352

interim changes

1d1076d

helper -> utils

c4461d5

move macro to a central spot

39243b9

remove whitespace

05a1c15

commit before merge

559a127

Merge branch 'main' into dev/dlpack_integration

ba98166

current status

ca88f51

interim

db1fe9d

remove and format

9b318d0

more fixes

5da5f41

add fixes

827304d

add fixes

d69fd2b

more fixes

6f63c15

icfaust and others added 3 commits February 19, 2025 23:17

Update data_conversion.cpp

01e419b

Update data_conversion.cpp

754728e

clang formatting

159fba3

icfaust mentioned this pull request Feb 19, 2025

[bug] fix numpy fortran order support in from_table, errors in to_table #2322

Closed

13 tasks

icfaust and others added 4 commits February 20, 2025 01:10

fight segfault with dpctl

173d98b

Update data_conversion.cpp

7f01aea

formatting

c102f57

Update test_data.py

164f90c

Merge branch 'uxlfoundation:main' into dev/dlpack_integration

1dffca3

icfaust requested a review from Vika-F March 10, 2025 09:05

Merge branch 'uxlfoundation:main' into dev/dlpack_integration

8c8b206

ahuber21 reviewed Mar 14, 2025

View reviewed changes

onedal/datatypes/tests/test_data.py Show resolved Hide resolved

onedal/datatypes/tests/test_data.py Show resolved Hide resolved

sklearnex/tests/utils/base.py Show resolved Hide resolved

Update test_data.py

281b2bd

icfaust requested a review from ahuber21 March 16, 2025 20:45

formatting'

1a11a3f

ahuber21 approved these changes Mar 17, 2025

View reviewed changes

david-cortes-intel reviewed Mar 17, 2025

View reviewed changes

Update dlpack_utils.cpp

298fc2b

icfaust merged commit de2e679 into uxlfoundation:main Mar 18, 2025
12 of 13 checks passed

icfaust deleted the dev/dlpack_integration branch March 18, 2025 22:34

[enhancement] add dlpack support to to_table #2275

[enhancement] add dlpack support to to_table #2275

Uh oh!

Conversation

icfaust commented Jan 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

icfaust commented Feb 20, 2025

Uh oh!

icfaust commented Mar 10, 2025

Uh oh!

icfaust commented Mar 14, 2025

Uh oh!

icfaust commented Mar 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

icfaust commented Mar 16, 2025

Uh oh!

icfaust commented Mar 16, 2025

Uh oh!

david-cortes-intel Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

david-cortes-intel Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

icfaust Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

david-cortes-intel Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

david-cortes-intel commented Mar 17, 2025

Uh oh!

icfaust commented Mar 17, 2025

Uh oh!

icfaust commented Mar 18, 2025

Uh oh!

Uh oh!

Uh oh!

[enhancement] add dlpack support to `to_table` #2275

[enhancement] add dlpack support to `to_table` #2275

icfaust commented Jan 27, 2025 •

edited

Loading