Skip to content

Add transpose operation for F-order data and fix validation for GPU#3665

Draft
avolkov-intel wants to merge 3 commits into
uxlfoundation:mainfrom
avolkov-intel:dev/f-order-optimization
Draft

Add transpose operation for F-order data and fix validation for GPU#3665
avolkov-intel wants to merge 3 commits into
uxlfoundation:mainfrom
avolkov-intel:dev/f-order-optimization

Conversation

@avolkov-intel

Copy link
Copy Markdown
Contributor

Description


Checklist:

Completeness and readability

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with updates and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least a summary table with measured data, if performance change is expected.
  • I have provided justification why performance and/or quality metrics have changed or why changes are not expected.
  • I have extended the benchmarking suite and provided a corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

Comment thread cpp/oneapi/dal/backend/primitives/utils.hpp Outdated
@david-cortes-intel

Copy link
Copy Markdown
Contributor

@avolkov-intel What about logistic regression?

@david-cortes-intel

Copy link
Copy Markdown
Contributor

@avolkov-intel @Vika-F Looks like the 'copy' function is not using omatcopy when doing transposes:

inline void copy(ndview<T1, 2, ord1>& dst, const ndview<T2, 2, ord2>& src) {

Perhaps that could be improved in a different PR. Omatcopy should do it in parallel and can have good speedups on CPU when transposing matrices.

For the non-transpose case with strides, it could also use 'lacpy' from MKL instead:
https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-fortran/2023-2/lacpy.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants