Memory pool support (cuMemPool*)

I noticed cudarc doesn't have any wrappers for the CUDA memory pool
API (cuMemPoolCreate, cuMemAllocFromPoolAsync, etc.) beyond what's
already in the sys bindings.
From what I can know, cudarc uses cuMemAllocAsync internally for
CudaSlice allocation, which goes through the default pool. But there
doesn't seem to be a way to create custom pools, allocate from a
specific pool, or do things like trimming unused memory.

I came across this while looking at #514 and the downstream candle PR
(huggingface/candle#3352) both deal with memory needing to persist
across CUDA graph replays, and it seems like custom pools are how
CUDA expects you to handle that.

I was thinking this could be approached like this:
1. result-level wrappers in a new pub mod mem_pool for the core
functions: create, destroy, trim_to, get/set attribute, and the
device-level get_default_mem_pool/get_mem_pool/set_mem_pool.

2. A safe-level CudaMemPool type with Drop, plus something like
CudaStream::alloc_from_pool() and CudaContext::default_mem_pool().

Though I'm not sure about a few things and would appreciate guidance:
Should pool-allocated CudaSlices track which pool they came from?
The pool struct (CUmemPoolProps) changed between 11.x and 12.x. So do we target a specific version?
I am not too sure how this should interact with CudaGraph capture..?

Happy to start with the result-level wrappers if this seems like a
reasonable direction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory pool support (cuMemPool*) #536

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Memory pool support (cuMemPool*) #536

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions