Conversation
| template <typename PrimaryUpstream, | ||
| typename AlternateUpstream, | ||
| typename ExceptionType = rmm::out_of_memory> |
There was a problem hiding this comment.
issue: We are moving away from having templated adaptors and instead just using type-erased resource_ref objects (see #1661). To avoid introducing a new such class here, can we immediately move to the new model:
template<typename ExceptionType = rmm::out_of_memory>
class failure_alternate_resource_adaptor final : public device_memory_resource {
...
failure_alternate_resource_adaptor(device_async_resource_ref primary, device_async_resource_ref alternate) : ... {}
Or, if we can't type-erase yet, we should at least accept resource refs as an alternate constructor, and store resource_refs rather than templated type-specific usptreams.
There was a problem hiding this comment.
Done in cbd7f43, which rely on implicit type conversion from device_memory_resource* to device_async_resource_ref
|
Devil's advocating: couldn't we implement this using |
Not as-is. The callback function in It is possible to write a new resource that can handle both |
harrism
left a comment
There was a problem hiding this comment.
Mostly doc comments. However also needs C++ tests.
| * @throws `exception_type` if the requested allocation could not be fulfilled | ||
| * by the primary or the alternate upstream resource. |
There was a problem hiding this comment.
I think it's ExceptionType. But actually I don't think you can say what type of exception will be thrown by alternate_upstream if it fails to allocate. It could be a CUDA error, or it could be rmm::out_of_memory, or some other exception.
The alternate upstream should document what exceptions it may throw.
Added C++ tests |
| * @tparam ExceptionType The type of exception that this adaptor should respond to. | ||
| */ | ||
| template <typename ExceptionType = rmm::out_of_memory> | ||
| class failure_alternate_resource_adaptor final : public device_memory_resource { |
There was a problem hiding this comment.
nit: fallback_resource_adapater feels like a more concise name.
wence-
left a comment
There was a problem hiding this comment.
Approving with request to rename test files appropriately.
bdice
left a comment
There was a problem hiding this comment.
We're still discussing the use cases of this feature offline -- for now, I'm requesting changes so that we don't merge this with typos in the API and filename name.
…lternate_resource_adaptor
|
Let's put this on hold until we get some more use cases. |
Implement out-of-memory protection by using a RMM resource `RmmFallbackResource` based on rapidsai/rmm#1665. The idea is to use managed memory when the RMM pool raises an OOM error. Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: #287
New resource adaptor that uses an alternate upstream resource when the primary throws a specified exception type.
The motivation here is to provide NO-OOM by using managed memory when the primary device resource runs out of memory.
Checklist