[FEA] Support batch allocation and deallocation

**Is your feature request related to a problem? Please describe.**
During processing big data, we frequently encounter the situations where we need to perform computation on a large number of dataframes. For example, we need to gather/scatter or transform hundreds up to thousands of data columns. For any of these operations, we need to allocate a lot of memory buffers for both intermediate as well as final output columns.

**Describe the solution you'd like**
Allocation and deallocation of each memory buffer always have overhead, not to mention the latency of preparing thread-local data for doing the allocation/deallocation operations. Mostly, such overhead comes from acquiring a shared mutex and (possibly) creating a CUDA event. For example: https://github.com/rapidsai/rmm/blob/889050d2f46f1412e50be83631d0d42b2251f445/cpp/include/rmm/mr/detail/stream_ordered_memory_resource.hpp#L194-L202

**Describe alternatives you've considered**
Implement a batch processing mechanism for allocation and deallocation across the memory resource classes:
 * This can start from a very simple modification: instead of locking the mutex and (possibly) creating the CUDA event for each allocation/deallocation as now, the batch alloc/dealloc functions will just lock the mutex and (maybe) creating CUDA event once, then alloc/dealloc a large number of buffers before releasing the mutex.
 * Incremental improvements can be added on top of it. For example, the [`get_block` function](https://github.com/rapidsai/rmm/blob/889050d2f46f1412e50be83631d0d42b2251f445/cpp/include/rmm/mr/detail/stream_ordered_memory_resource.hpp#L323C60-L323C72) can be reimplemented for more efficiently processing batch alloc/dealloc.

**Additional context**
Any better way for reducing overhead of alloc/dealloc large numbers of buffers would be helpful.

	void* do_allocate(std::size_t size, cuda_stream_view stream) override
	{
	RMM_LOG_TRACE("[A][stream %s][%zuB]", rmm::detail::format_stream(stream), size);

	if (size <= 0) { return nullptr; }

	lock_guard lock(mtx_);

	auto stream_event = get_event(stream);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEA] Support batch allocation and deallocation #2190

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA] Support batch allocation and deallocation #2190

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions