Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proper CPUID eax checking #3026

Merged
merged 3 commits into from
Feb 28, 2025
Merged

Add proper CPUID eax checking #3026

merged 3 commits into from
Feb 28, 2025

Conversation

isuruf
Copy link
Contributor

@isuruf isuruf commented Dec 17, 2024

This makes it possible to use the sse42, avx2 code paths on AMD processors

Description

Add a comprehensive description of proposed changes

  • Adds proper CPUID extension support checking which was previously done by guarding with a check for Intel CPUs. This allows AMD processors to use the sse42, avx2 code paths instead of the default sse2.

List associated issue number(s) if exist(s): #6 (for example)

Documentation PR (if needed): #1340 (for example)

Benchmarks PR (if needed): IntelPython/scikit-learn_bench#155 (for example)


PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.

You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).

Checklist to comply with before moving PR from draft:

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
  • I have provided justification why performance has changed or why changes are not expected.
  • I have provided justification why quality metrics have changed or why changes are not expected.
  • I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

Copy link
Contributor

@icfaust icfaust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial review.

@@ -71,6 +71,19 @@ void run_cpuid(uint32_t eax, uint32_t ecx, uint32_t * abcd)
#endif
}

uint32_t __daal_internal_get_max_extension_support()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not necessary, it can be folded in below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This follows the pattern of __daal_internal_is_intel_cpu and daal_check_is_intel_cpu where the former is the one that does the work and the latter is there to cache the value in a static variable to avoid running the former multiple times.

@@ -71,6 +71,19 @@ void run_cpuid(uint32_t eax, uint32_t ecx, uint32_t * abcd)
#endif
}

uint32_t __daal_internal_get_max_extension_support()
{
uint32_t abcd[4];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clearer naming is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming used is abcd throughout this file. This variable stores the values of the eax, ebx, ecx, edx registers just like the other functions. Any ideas on what to rename all of them to?

@@ -193,7 +211,7 @@ static int check_sse42_features()

DAAL_EXPORT bool __daal_serv_cpu_extensions_available()
{
return daal_check_is_intel_cpu();
return 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, if a bool, it would be best to return a bool. Secondly, if this is a no-op, then the function should be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the function and its usages.

@isuruf
Copy link
Contributor Author

isuruf commented Dec 19, 2024

Thanks for the review @icfaust

@isuruf
Copy link
Contributor Author

isuruf commented Jan 6, 2025

@icfaust what are the next steps for this PR?

@icfaust
Copy link
Contributor

icfaust commented Jan 6, 2025

@isuruf I think addressing uxlfoundation/scikit-learn-intelex#1000 may take priority, though I will defer to code owners on this. I think testability is key, and this PR may cover up an underlying issue.

@isuruf
Copy link
Contributor Author

isuruf commented Jan 7, 2025

I think testability is key, and this PR may cover up an underlying issue.

Yeah, it will cover up an underlying issue in sse2 code path, but those are really old hardware and it seems like the only way to make that bug surface is to use/emulate really old hardware or use AMD hardware. I don't think it is fair to keep AMD hardware throttled to sse2 just to have a way to reproduce this bug.

@isuruf
Copy link
Contributor Author

isuruf commented Jan 13, 2025

Ping on this

@napetrov
Copy link
Contributor

/intelci: run

@napetrov napetrov marked this pull request as ready for review January 29, 2025 22:32
@napetrov
Copy link
Contributor

I think testability is key, and this PR may cover up an underlying issue.

Yeah, it will cover up an underlying issue in sse2 code path, but those are really old hardware and it seems like the only way to make that bug surface is to use/emulate really old hardware or use AMD hardware. I don't think it is fair to keep AMD hardware throttled to sse2 just to have a way to reproduce this bug.

I think those can be separated. One is changing detection of instructions which is ok. There are several reasons why this wasn't touched from the beginning - there is no AMD specific validation or maintainers for it with responsibility over those. It's simple with this change and i wouldn't expect issues there, but there are other things that are enabled specifically on Intel just to reduce potential problems.

On sse2 problem - i think it might be not specific to sse2 as there is no specialization there and this is what get compiled by default. though it was not reported for other platforms.

@napetrov
Copy link
Contributor

napetrov commented Feb 6, 2025

/intelci: run

@napetrov
Copy link
Contributor

/intelci: run

@Vika-F
Copy link
Contributor

Vika-F commented Feb 27, 2025

@isuruf Please rebase the branch. It looks like the tests are failing in pre-commit because some fixes are missing in the branch.

Copy link
Contributor

mergify bot commented Feb 27, 2025

rebase

✅ Branch has been successfully rebased

@napetrov
Copy link
Contributor

@Mergifyio rebase

Copy link
Contributor

mergify bot commented Feb 27, 2025

rebase

✅ Nothing to do for rebase action

@napetrov
Copy link
Contributor

/intelci: run

@napetrov napetrov merged commit 7884821 into uxlfoundation:main Feb 28, 2025
17 of 22 checks passed
@isuruf
Copy link
Contributor Author

isuruf commented Feb 28, 2025

Thanks everyone for the review and merge.

@napetrov
Copy link
Contributor

Thanks everyone for the review and merge.

thanks for contribution!

@isuruf isuruf deleted the cpuid branch February 28, 2025 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants