Skip to content

Python BufferResource lifetime management #1074

@madsbk

Description

@madsbk

Problem

With #1069, the C++ BufferResource lifetime is now maintained at the C++ level. Any downstream consumer holding a deep-copied any_resource<> from br.device_mr keeps the underlying C++ BufferResource alive.

However, this is still insufficient for Python-created BufferResources. The C++ object only borrows its stream pool, device MR, pinned MR, and statistics objects. Those resources are actually owned by Python and kept alive through self._<thing> attributes on the Python wrapper.

If the Python BufferResource wrapper is garbage collected while a downstream object still holds device_mr, the C++ BufferResource survives, but its borrowed dependencies are destroyed underneath it. The next allocation can therefore trigger a use-after-free.

This is the lifetime issue tracked in #641.

Proposed approach

1. Stream pool: move ownership into C++

The C++ constructor already supports owning the stream pool through a normal
shared_ptr deleter, for example the default:

std::make_shared<rmm::cuda_stream_pool>(16, ...)

from:

cpp/include/rapidsmpf/memory/buffer_resource.hpp:91

We should therefore remove the Python-side borrowed stream pool entirely.

Replace the Python constructor argument:

stream_pool: CudaStreamPool

with:

stream_pool_size: int

and expose streams through:

BufferResource.get_stream()

implemented as:

Stream._from_cudaStream_t(view.value(), owner=self)

The owner=self reference ensures the Python BufferResource wrapper stays alive
for the lifetime of any handed-out stream.

2. Device MR, pinned MR, and statistics: Python-aware deleter

These resources are user-supplied, so ownership must remain on the Python side.

Wrap each borrowed resource in a std::shared_ptr whose deleter holds a strong
Python reference:

  • Py_INCREF when constructing the control block
  • Py_DECREF under the GIL when destroying it

The resulting C++ control block then transitively pins the Python wrapper for as
long as any downstream object still holds the associated resource.

Apply this pattern to all three handles passed into the C++ BufferResource:

  • device MR
  • pinned MR
  • statistics

For some more context, see @pentschev's comment: #1069 (comment)

Metadata

Metadata

Assignees

Labels

improvementImproves an existing functionality

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions