-
Notifications
You must be signed in to change notification settings - Fork 225
feature: add noise_variance computation on oneDAL side #3101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feature: add noise_variance computation on oneDAL side #3101
Conversation
|
/intelci: run |
| // SYEVR branch | ||
| // In this case, we compute only nComponents eigenvectors and then sort them in descending order | ||
| // inside the 'computeEigenvectorsInplaceSyevr' function | ||
| if (nComponents < nFeatures) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it also limit the components by the number of rows in the data by this point? Is that info available here?
| eigenvalues[i] = temp_eigenvalues[idx]; | ||
| for (size_t j = 0; j < nFeatures; ++j) | ||
| { | ||
| eigenvectors[j + i * nFeatures] = temp_eigenvectors[j + idx * nFeatures]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a dedicated lapack function to reorder columns:
https://www.netlib.org/lapack/explore-3.2.1-html/dlapmt.f.html
Plus there's C++ 'reverse' for vectors:
https://en.cppreference.com/w/cpp/algorithm/reverse.html
|
|
||
| Float max_val = row[0]; | ||
| Float abs_max = std::abs(row[0]); | ||
| for (std::int64_t j = 1; j < column_count; j++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it perhaps be faster to use idamax from blas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a todo for investigation. For now looks like makes the pr bigger
| auto explained_variances_ratio_ptr = explained_variances_ratio.get_mutable_data(); | ||
|
|
||
| Float sum = 0; | ||
| for (std::int64_t i = 0; i < column_count; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could oneDPL be used for this kind of things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a todo mark, but may be done in next pr
| auto eigvals_ptr = eigenvalues.get_data(); | ||
| auto singular_values_ptr = singular_values.get_mutable_data(); | ||
|
|
||
| const Float factor = row_count - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why '-1' here? Wouldn't this make the result not meet the necessary property that
| auto compute_event = queue.submit([&](sycl::handler& h) { | ||
| h.depends_on(deps); | ||
| h.parallel_for(sycl::range<1>(component_count), [=](sycl::id<1> i) { | ||
| singular_values_ptr[i] = sycl::sqrt(factor * eigvals_ptr[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syevr only generates results that are valid up to numerical tolerance. So a very small eigenvalue that should in theory be positive or zero could still end up as a very small negative number.
|
@Alexandr-Solovev please rebase |
|
/intelci: run |
|
/intelci: run |
|
/intelci: run |
Changes Summary
Added
noise_variancecomputationImplemented calculation of noise variance as part of the PCA result options. This allows better estimation of the unexplained variance in the dataset.
Added
syevr-based eigen decomposition functionIntroduced a new function utilizing LAPACK's
syevrroutine to improve the performance of eigenvector and eigenvalue computations for symmetric matrices.PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.
You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).
Checklist to comply with before moving PR from draft:
PR completeness and readability
Testing
Performance