Skip to content

Improve CUDA EDProducers #30266

@makortel

Description

@makortel

Currently a CUDA EDProducer follows a pattern where a cms::cuda::ScopedContext* object is used to set up the CUDA context (current device, CUDA stream, etc). The pattern has some annoyances

  • With ExternalWork the user has to remember to pass the edm::WaitingTaskWithArenaHolder to the ScopedContextAcquire constructor in acquire()
  • In order to have a chance to use the same CUDA stream that was used to produce a product, the user has to pass the input product twice, once to construct the ScopedContext and a second time to get the real product from the cma::cuda::Product<T> wrapper
cms::cuda::Product<RealProduct> const& prodWrapper = iEvent.get(prodToken_);
cms::cuda::CUDAScopedContextProduce ctx(prodWrapper);
RealProduct const& prod = ctx.get(prodWrapper);
  • In order to use the same CUDA stream in acquire() and produce() (e.g. to launch CUDA work in acquire() and put a CUDA event product into the event), the state (current device and CUDA stream) must be stored in a member variable of type cms::cuda::ContextState.
  • ScopedContext* have a destructor that calls cudaStreamAddCallback() or cudaEventRecord()
    • These operations are called even if the ScopedContext* is destructed during stack unwinding because of an exception, in which case it would make more sense to not call them
    • In principle their return codes should be checked for errors, but here they are ignored to avoid throwing an exception from a destructor

These could be improved by adding base classes for the CUDA EDProducers (edm::global, edm::stream, and edm::stream with ExternalWork would likely suffice) whose acquire() and produce() calls would create the ScopedContext* and pass that to the deriving class acquire() (instead of WaitingTaskWithArenaHolder) and produce(). The ScopedContext* could then have a new member function to be called after the deriving class' acquire()/produce() to call cudaStreamAddCallback() or cudaEventRecord(). The CUDA stream to be possibly re-used could be picked from the first product to be read from the event, unless the CUDA stream is asked from the ScopedContext* first, in which case a new CUDA stream would be used. The ContextState object would become a member variable of the (edm::stream with ExternalWork) base class and would thus be hidden from the user.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions