Open
Description
As my teammates and I move to multiple, big GPUs with many streams doing interdependent work simultaneously, synchronizing at the GPU level is introducing significant amounts of inefficiency. We'd like to be able to use cudaStreamWaitEvent to make a stream wait on an event. There's already RecordEvent, CreateEvent, DestroyEvent and QueryEvent, so I'm hoping this is a straightforward addition.