Revisit Per Bitmap Allocators

**Is your feature request related to a problem? Please describe.**
- I want to be able to be able to control allocation of roaring bitmaps in a thread-safe way, in a library which may not control all uses of roaring bitmaps in the process.
- I want to be able to e.g. use an allocator that gives out memory e.g. from a non-global slab allocator.
- As a maintainer of the `croaring-rs` rust bindings, I want to be able to provide a safe rust API to control allocation in roaring bitmaps:
  - Currently [the function](https://docs.rs/croaring/latest/croaring/fn.configure_custom_alloc.html) to configure a global allocator must be unsafe because the caller must ensure there are not any objects allocated by CRoaring at the time the function to configure the global allocator is called (or races to interact with croaring in another thread)

**Describe the solution you'd like**
I would like to have the option to specify a custom allocator for each bitmap. I would like the allocator functions to take in a 'user_data' parameter, which will also be stored with the bitmap, to allow non-static data to be used for allocation.

I'm imagining an API like:

```c
typedef struct roaring_allocator_s {
    void* (*malloc)(size_t size, void* user_data);
    void* (*realloc)(void* ptr, size_t size, void* user_data);
    void* (*calloc)(size_t count, size_t size, void* user_data);
    void (*free)(void* ptr, void* user_data);
    void* (*aligned_malloc)(size_t alignment, size_t size, void* user_data);
    void (*aligned_free)(void* ptr, void* user_data);
} roaring_allocator_t;

roaring_bitmap_t *roaring_bitmap_create_with_allocator(const roaring_allocator_t *alloc, void *user_data);
```

**Describe alternatives you've considered**

If I control all uses of CRoaring in the process, I can come close to safely using a global allocator which relies on global thread local state to refer to non-local data

**Additional context**

Previous Issues, PRs, discussions:

- #271 - A per-bitmap allocator, but each container also got a custom allocator. I will argue below that I do not believe this is necessary.
- #284 - Issue discussing custom allocators after #271 was closed.
- #358 - Implemented the current global allocator hooks
- #638 - Somewhat related, the fact that allocators can't safely return null is unfortunate for custom allocators (it would be cool to make a stack-allocator with just an upper limit of memory it will give out).

Previous dismisal about per-bitmap allocators:

> CRoaring has the concept of shared container, containers that can be part of several bitmaps. Given that these users would like to have different bitmaps use different memory allocators, this means that we somehow need to trace memory allocation at the container level (bitmaps are made of small independent entities called containers) while resolving the conflict for shared containers if any.
> [link](https://github.com/RoaringBitmap/CRoaring/issues/284#issue-854530624)

> A per-bitmap dynamically chosen allocator is not something we are going to do. It is has much too great an overhead. It is not just the allocation, but also follow-up operations (deallocation) that need to be tracked and the bitmaps interact with each others, a container allocated in one bitmap, with one memory system could end up within another bitmap that uses another memory allocator. This implies that every container, and some of them can be just a few bytes, needs to track its source. It adds memory usage, computational overhead, complexity overhead, etc.
> [link](https://github.com/RoaringBitmap/CRoaring/issues/611#issuecomment-2016516201)

I argue that we do not need to track allocators per container, only per bitmap. As far as I can see, there is no way for a container to **move** from one bitmap to another with the operations we currently expose. If COW is enabled, containers can be **shared** between bitmaps, which could lead to issues if they are shared between bitmaps with different allocators. However, COW is already [documented](https://github.com/RoaringBitmap/CRoaring/blob/8aef53e3b25d83d0daa8b1fd8c398b8fc2279826/include/roaring/roaring.h#L135-L137) to be dangerous to use inconsistently. I believe we can just document that if when using custom allocators in combination with COW, it's up to the user to ensure that all allocators match for all bitmaps involved in operations.

Other than COW, containers are created for a specific bitmap, and never leave it, and are deallocated in the context of working with the same bitmap, so they don't need to track their allocator separately.

**Performance**
We already have an indirect function call for every allocation operation after #358.

I envision something like the following (in an internal header):

```c
typedef struct roaring_allocator_context_s {
    const roaring_allocator_t* alloc;
    void* alloc_data;
} roaring_allocator_context_t;

#define ALLOCATOR_CTX_GLOBAL ((roaring_allocator_context_t){NULL, NULL})

static inline void* allocator_malloc(roaring_allocator_context_t alloc_ctx,
                                     size_t size) {
    return alloc_ctx.alloc ? alloc_ctx.alloc->malloc(size, alloc_ctx.alloc_data)
                           : roaring_malloc(size);
}

static inline roaring_allocator_context_t roaring_bitmap_get_allocator_internal(roaring_bitmap_t *r) { /* ... */ }
```

Most internal functions (which can allocate, even transitively) would need to be updated to take a `roaring_allocator_context_t` parameter.

I _believe_ this will have a negligible performance impact, but would need to benchmark to verify.

**Are you willing to contribute code or documentation toward this new feature?**
Yes.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revisit Per Bitmap Allocators #736

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Revisit Per Bitmap Allocators #736

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions