Open
Description
Is your feature request related to a problem? Please describe.
It is important to ensure that all device memory allocated inside of cuDF functions is done through RMM.
It is easy to overlook this, e.g., by forgetting to pass the rmm::exec_policy
to a Thrust algorithm that allocates temporary memory.
Describe the solution you'd like
It would be fairly easy to add this to our CI testing by writing a LD_PRELOAD library that overloads cudaMalloc
to throw an error if it is called more than once.
This would ensure that there is only a single cudaMalloc
call for the pool allocation.
There are some things to be aware of with this solution:
- We'd need to ensure the pool is sized such that it won't need to grow for the tests
- It would assume we're using cudaMalloc as the upstream resource for the pool (and not cudaMallocManaged)