-
-
Notifications
You must be signed in to change notification settings - Fork 209
MinMax #634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
MinMax #634
Conversation
|
I tested a dummy script, it seems to be a problem in the test implementation rather than in the actual code implementation, do you know what the error is? EDIT: The problem is with testblas.cpp because I haven't added the min argument yet into it. |
This reverts commit f607b81.
…nd it gets the correct result.
|
I think the issue was with local memory. It was using to much local memory and then when I fixed that, WGS1 or WGS2 was not divisible by 2. I will add a Xaminmax version of this, and then I think it will be mergeable. |
|
Should be ready for merge but the routines are incompatible with WGS1 or WGS2 not being divisible by 2. |
CNugteren
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some minor comments on the new code.
Regarding the issue with the local memory size (#634 (comment)) it seems you solved it by halving the sizes, but now the new issue that WGS needs to be a multiple of 2. I think this is a valid assumption, since https://github.com/CNugteren/CLBlast/blob/master/src/tuning/kernels/xdot.hpp#L64 only tests with those values anyway. Perhaps you can add a comment on the place where you divide WGS by 2 that refers to this line in the tuner?
Other than that I think my previous question remains unanswered: As for the tests that you did, I assume you testing all of the following:
- Run with the -full_test option?
- Run the above with a variety of values for WGS1 and WGS2?
|
I did run both full test and a variety of values for WGS1 and WGS2. |
Co-authored-by: Cedric Nugteren <[email protected]>
Co-authored-by: Cedric Nugteren <[email protected]>
Good. I'm assuming this includes all the values/combinations that the tuner searches for (apart from those that don't fit on your test device of course). I think the only other thing left is the suggestion above regarding adding a link to the code in two places:
After that I'll do a local test and then we should be ready to go. |
|
I compiled and ran locally, but on my test system I do get one error for the double-complex case (or 4 when I run the Let me know if you want more info. |
|
Not done yet, got to test on another device, and that has a lot of errors. Probably something to do with the testing system... |
This resolves #568 although the test doesn't pass. Any ideas on why it doesn't pass? Most of the kernel implementation is xamax but changed for max and min.