Skip to content

[FEA] Support batch allocation and deallocation #2190

@ttnghia

Description

@ttnghia

Is your feature request related to a problem? Please describe.
During processing big data, we frequently encounter the situations where we need to perform computation on a large number of dataframes. For example, we need to gather/scatter or transform hundreds up to thousands of data columns. For any of these operations, we need to allocate a lot of memory buffers for both intermediate as well as final output columns.

Describe the solution you'd like
Allocation and deallocation of each memory buffer always have overhead, not to mention the latency of preparing thread-local data for doing the allocation/deallocation operations. Mostly, such overhead comes from acquiring a shared mutex and (possibly) creating a CUDA event. For example:

void* do_allocate(std::size_t size, cuda_stream_view stream) override
{
RMM_LOG_TRACE("[A][stream %s][%zuB]", rmm::detail::format_stream(stream), size);
if (size <= 0) { return nullptr; }
lock_guard lock(mtx_);
auto stream_event = get_event(stream);

Describe alternatives you've considered
Implement a batch processing mechanism for allocation and deallocation across the memory resource classes:

  • This can start from a very simple modification: instead of locking the mutex and (possibly) creating the CUDA event for each allocation/deallocation as now, the batch alloc/dealloc functions will just lock the mutex and (maybe) creating CUDA event once, then alloc/dealloc a large number of buffers before releasing the mutex.
  • Incremental improvements can be added on top of it. For example, the get_block function can be reimplemented for more efficiently processing batch alloc/dealloc.

Additional context
Any better way for reducing overhead of alloc/dealloc large numbers of buffers would be helpful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    To-do

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions