Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

math_brute_force: treat reciprocal as unary function #2281

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

svenvh
Copy link
Member

@svenvh svenvh commented Feb 18, 2025

Treat reciprocal as a unary function, instead of handling it through the binary function testing mechanism and special-casing it there.

This addresses two shortcomings of the previous implementation:

  • Testing took significantly longer as the entire input domain was tested many times (e.g. fp16 reciprocal has only 2^16 possible input values, but binary function testing iterates over 2^16 * 2^16 input values).

  • The reciprocal test kernel was identical to the divide kernel. Thus the device compiler would see a regular divide operation instead of a reciprocal operation and would be unlikely to emit a specialized reciprocal sequence.

This reverts all of the changes in binary_operator*.cpp made by bcfa1f7 ("Added corrections to re-enable reciprocal test in math_brute_force suite for relaxed math mode (#2221)", 2025-02-04).

Copy link
Contributor

@bashbaug bashbaug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this direction and this is something we should continue pursuing, but unfortunately this isn't working for me in its current form.

Treat reciprocal as a unary function, instead of handling it through
the binary function testing mechanism and special-casing it there.

This addresses two shortcomings of the previous implementation:

 - Testing took significantly longer as the entire input domain was
   tested many times (e.g. fp16 reciprocal has only 2^16 possible
   input values, but binary function testing iterates over 2^16 * 2^16
   input values).

 - The reciprocal test kernel was identical to the divide kernel.
   Thus the device compiler would see a regular divide operation
   instead of a reciprocal operation and would be unlikely to emit a
   specialized reciprocal sequence.

This reverts all of the changes in binary_operator*.cpp made by
bcfa1f7 ("Added corrections to re-enable reciprocal test in
math_brute_force suite for relaxed math mode (KhronosGroup#2221)", 2025-02-04).

Signed-off-by: Sven van Haastregt <[email protected]>
@svenvh svenvh marked this pull request as ready for review February 19, 2025 10:44
@shajder
Copy link
Contributor

shajder commented Feb 19, 2025

Works fine with my nvidia and intel devices.

Copy link
Contributor

@bashbaug bashbaug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This works for the device I've tested, and in some cases it's a significant performance improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants