-
Notifications
You must be signed in to change notification settings - Fork 5
Description
I seem to be failing the v4 and v5 tests where they are compared to the standard implementation. After running some tests I figured it may be due to the C++ code continuing to run (and copying back the result) before the kernels have finished running.
I added enqueueBarrierWithWaitList() before the read, but this still doesn't fix the problem.
After some googling, people recommend not swapping the openCL buffers and instead swapping the arguments to the next kernel call, so I've implemented this and now I am passing these tests.
My question is, doesn't enqueueBarrierWithWaitList() simply tell the openCL queue to wait, but the C++ code will continue to execute, meaning the std::swap will be executed at the wrong time?
I had this code:
kernel.setArg(3, buffState);
kernel.setArg(4, buffBuffer);
queue.enqueueNDRangeKernel(kernel, offset, globalSize, localSize);
queue.enqueueBarrierWithWaitList();
std::swap(buffState, buffBuffer);which is exactly what is mentioned in the instructions, but it wasn't working.
Maybe I have missed something in my code to ensure the swap happens at the right time?