Skip to content

Conversation

@EJainDev
Copy link
Contributor

@EJainDev EJainDev commented Sep 7, 2025

This resolves #568 although the test doesn't pass. Any ideas on why it doesn't pass? Most of the kernel implementation is xamax but changed for max and min.

@EJainDev
Copy link
Contributor Author

EJainDev commented Sep 13, 2025

I tested a dummy script, it seems to be a problem in the test implementation rather than in the actual code implementation, do you know what the error is?

EDIT: The problem is with testblas.cpp because I haven't added the min argument yet into it.
EDIT2: Fixed some more (potential) errors but it seems I'm still missing something.

@EJainDev EJainDev marked this pull request as draft September 17, 2025 01:05
@EJainDev
Copy link
Contributor Author

EJainDev commented Oct 8, 2025

I think the issue was with local memory. It was using to much local memory and then when I fixed that, WGS1 or WGS2 was not divisible by 2. I will add a Xaminmax version of this, and then I think it will be mergeable.

@EJainDev EJainDev marked this pull request as draft October 8, 2025 00:52
@EJainDev EJainDev marked this pull request as ready for review October 9, 2025 21:34
@EJainDev
Copy link
Contributor Author

Should be ready for merge but the routines are incompatible with WGS1 or WGS2 not being divisible by 2.

Copy link
Owner

@CNugteren CNugteren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minor comments on the new code.

Regarding the issue with the local memory size (#634 (comment)) it seems you solved it by halving the sizes, but now the new issue that WGS needs to be a multiple of 2. I think this is a valid assumption, since https://github.com/CNugteren/CLBlast/blob/master/src/tuning/kernels/xdot.hpp#L64 only tests with those values anyway. Perhaps you can add a comment on the place where you divide WGS by 2 that refers to this line in the tuner?

Other than that I think my previous question remains unanswered: As for the tests that you did, I assume you testing all of the following:

  • Run with the -full_test option?
  • Run the above with a variety of values for WGS1 and WGS2?

@EJainDev
Copy link
Contributor Author

I did run both full test and a variety of values for WGS1 and WGS2.

@CNugteren
Copy link
Owner

I did run both full test and a variety of values for WGS1 and WGS2.

Good. I'm assuming this includes all the values/combinations that the tuner searches for (apart from those that don't fit on your test device of course).

I think the only other thing left is the suggestion above regarding adding a link to the code in two places:

(...) Perhaps you can add a comment on the place where you divide WGS by 2 that refers to this line in the tuner?

After that I'll do a local test and then we should be ready to go.

@CNugteren
Copy link
Owner

CNugteren commented Oct 27, 2025

I compiled and ran locally, but on my test system I do get one error for the double-complex case (or 4 when I run the full_test). Can you check? Might just be an error in the test itself?

* Running on OpenCL device 'Intel(R) UHD Graphics 620 [0x3ea0]'.
* Starting tests for the 'iZMINMAX' routine. Legend:
   : -> Test produced correct results
   . -> Test returned the correct error code
   X -> Test produced incorrect results
   / -> Test returned an incorrect error code
   \ -> Test not executed: OpenCL-kernel compilation error
   o -> Test not executed: Unsupported precision
   - -> Test not completed: Reference CBLAS doesn't output error codes
* Testing with error margins of 0.5% (relative) and 0.001 (absolute)
* Testing 'regular behaviour' for 'default':
   Testing: n=7 offx=0 incx=1 [CLBlast] [CPU BLAS] -> :
   Testing: n=7 offx=0 incx=2 [CLBlast] [CPU BLAS] -> :
   Testing: n=7 offx=0 incx=7 [CLBlast] [CPU BLAS] -> :
   Testing: n=64 offx=0 incx=1 [CLBlast] [CPU BLAS] -> :
   Testing: n=64 offx=0 incx=2 [CLBlast] [CPU BLAS] -> 
   Buffer overflow index 1:  38.00+0.00i (reference) versus  14.00+0.00i (CLBlast)
   Combined average L2 error: 0.00e+00
   X
   Testing: n=64 offx=0 incx=7 [CLBlast] [CPU BLAS] -> :
   Error rate 0.78%: n=64 offx=0 incx=2 
   Pass rate  83.3%: 5 passed / 0 skipped / 1 failed
* Completed all test-cases for this routine. Results:
   5 test(s) passed
   0 test(s) skipped
   1 test(s) failed

Let me know if you want more info.

@EJainDev EJainDev marked this pull request as draft October 27, 2025 22:03
@EJainDev
Copy link
Contributor Author

Not done yet, got to test on another device, and that has a lot of errors. Probably something to do with the testing system...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Find min/max in one function

2 participants