-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Description
Currently a CUDA EDProducer follows a pattern where a cms::cuda::ScopedContext* object is used to set up the CUDA context (current device, CUDA stream, etc). The pattern has some annoyances
- With
ExternalWorkthe user has to remember to pass theedm::WaitingTaskWithArenaHolderto theScopedContextAcquireconstructor inacquire() - In order to have a chance to use the same CUDA stream that was used to produce a product, the user has to pass the input product twice, once to construct the
ScopedContextand a second time to get the real product from thecma::cuda::Product<T>wrapper
cms::cuda::Product<RealProduct> const& prodWrapper = iEvent.get(prodToken_);
cms::cuda::CUDAScopedContextProduce ctx(prodWrapper);
RealProduct const& prod = ctx.get(prodWrapper);- In order to use the same CUDA stream in
acquire()andproduce()(e.g. to launch CUDA work inacquire()and put a CUDA event product into the event), the state (current device and CUDA stream) must be stored in a member variable of typecms::cuda::ContextState. ScopedContext*have a destructor that callscudaStreamAddCallback()orcudaEventRecord()- These operations are called even if the
ScopedContext*is destructed during stack unwinding because of an exception, in which case it would make more sense to not call them - In principle their return codes should be checked for errors, but here they are ignored to avoid throwing an exception from a destructor
- These operations are called even if the
These could be improved by adding base classes for the CUDA EDProducers (edm::global, edm::stream, and edm::stream with ExternalWork would likely suffice) whose acquire() and produce() calls would create the ScopedContext* and pass that to the deriving class acquire() (instead of WaitingTaskWithArenaHolder) and produce(). The ScopedContext* could then have a new member function to be called after the deriving class' acquire()/produce() to call cudaStreamAddCallback() or cudaEventRecord(). The CUDA stream to be possibly re-used could be picked from the first product to be read from the event, unless the CUDA stream is asked from the ScopedContext* first, in which case a new CUDA stream would be used. The ContextState object would become a member variable of the (edm::stream with ExternalWork) base class and would thus be hidden from the user.