Description
If you look at enqueue_write_buffer() implementation (and other writes methods without _async
suffix), you can think that all those operations are 100% blocking, i.e., they do not return until memory pointed to by host_ptr
has been copied into the buffer. However, that's not true.
In the OpenCL specification clEnqueueWriteBuffer
only promises: If blocking_write is CL_TRUE
, the OpenCL implementation copies the data referred to by ptr
and enqueues the write operation in the command-queue. The memory pointed to by ptr
can be reused by the application after the clEnqueueWriteBuffer call returns.
In other words, it is blocking is a sense that it can return only after it has made sure that the memory pointed by ptr
can be reused by the application. It can just internally copy the memory pointed by ptr
to some temporary host buffer and return immediately after that. That's what Intel's OpenCL platform does for CPU devices (and I don't know any other implementation that behaves that way).
Usually it's not a problem, because user typically has one in-order command queue. However, if you write to and read from buffer using different command queues, you have a race condition. In Boost.Compute it might happen quite often, because for operator[]
s and in iterators we construct temporary command queues based on context of underlying buffer.
In commit 0f5c49d I fixed that problem for writing using operator[]
and iterators, which fixes situations like:
bc::vector<int> vector(10, context);
vector[0] = 3;
if(vector[0] == 3) // 100% sure it will be 3 at the time of read
{
...
}
, but code like this:
bc::vector<int> vector(context);
vector.push_back(3, queue);
vector.at(0); // does not have to be 3 at the time of read
still has a race condition (in some OpenCL implementations, currently I know this happens only on Intel).
The question is if we should make enqueue_write_buffer()
and other write methods in command_queue
fully, 100% blocking, which would require waiting on an event returned by write calls like clEnqueueWriteBuffer
.