Skip to content

Memory pool support (cuMemPool*) #536

@OneThing98

Description

@OneThing98

I noticed cudarc doesn't have any wrappers for the CUDA memory pool
API (cuMemPoolCreate, cuMemAllocFromPoolAsync, etc.) beyond what's
already in the sys bindings.
From what I can know, cudarc uses cuMemAllocAsync internally for
CudaSlice allocation, which goes through the default pool. But there
doesn't seem to be a way to create custom pools, allocate from a
specific pool, or do things like trimming unused memory.

I came across this while looking at #514 and the downstream candle PR
(huggingface/candle#3352) both deal with memory needing to persist
across CUDA graph replays, and it seems like custom pools are how
CUDA expects you to handle that.

I was thinking this could be approached like this:

  1. result-level wrappers in a new pub mod mem_pool for the core
    functions: create, destroy, trim_to, get/set attribute, and the
    device-level get_default_mem_pool/get_mem_pool/set_mem_pool.

  2. A safe-level CudaMemPool type with Drop, plus something like
    CudaStream::alloc_from_pool() and CudaContext::default_mem_pool().

Though I'm not sure about a few things and would appreciate guidance:
Should pool-allocated CudaSlices track which pool they came from?
The pool struct (CUmemPoolProps) changed between 11.x and 12.x. So do we target a specific version?
I am not too sure how this should interact with CudaGraph capture..?

Happy to start with the result-level wrappers if this seems like a
reasonable direction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions