Improved memory allocators

Improve our memory allocator support:

- Use a memory pool for CPU memory.
- Use our own internal memory pool for GPU memory (not necessarily CUB, since we want more flexibility in bucket sizing, etc.).
- Support pinned memory.