Skip to content

Revisit Per Bitmap Allocators #736

@Dr-Emann

Description

@Dr-Emann

Is your feature request related to a problem? Please describe.

  • I want to be able to be able to control allocation of roaring bitmaps in a thread-safe way, in a library which may not control all uses of roaring bitmaps in the process.
  • I want to be able to e.g. use an allocator that gives out memory e.g. from a non-global slab allocator.
  • As a maintainer of the croaring-rs rust bindings, I want to be able to provide a safe rust API to control allocation in roaring bitmaps:
    • Currently the function to configure a global allocator must be unsafe because the caller must ensure there are not any objects allocated by CRoaring at the time the function to configure the global allocator is called (or races to interact with croaring in another thread)

Describe the solution you'd like
I would like to have the option to specify a custom allocator for each bitmap. I would like the allocator functions to take in a 'user_data' parameter, which will also be stored with the bitmap, to allow non-static data to be used for allocation.

I'm imagining an API like:

typedef struct roaring_allocator_s {
    void* (*malloc)(size_t size, void* user_data);
    void* (*realloc)(void* ptr, size_t size, void* user_data);
    void* (*calloc)(size_t count, size_t size, void* user_data);
    void (*free)(void* ptr, void* user_data);
    void* (*aligned_malloc)(size_t alignment, size_t size, void* user_data);
    void (*aligned_free)(void* ptr, void* user_data);
} roaring_allocator_t;

roaring_bitmap_t *roaring_bitmap_create_with_allocator(const roaring_allocator_t *alloc, void *user_data);

Describe alternatives you've considered

If I control all uses of CRoaring in the process, I can come close to safely using a global allocator which relies on global thread local state to refer to non-local data

Additional context

Previous Issues, PRs, discussions:

Previous dismisal about per-bitmap allocators:

CRoaring has the concept of shared container, containers that can be part of several bitmaps. Given that these users would like to have different bitmaps use different memory allocators, this means that we somehow need to trace memory allocation at the container level (bitmaps are made of small independent entities called containers) while resolving the conflict for shared containers if any.
link

A per-bitmap dynamically chosen allocator is not something we are going to do. It is has much too great an overhead. It is not just the allocation, but also follow-up operations (deallocation) that need to be tracked and the bitmaps interact with each others, a container allocated in one bitmap, with one memory system could end up within another bitmap that uses another memory allocator. This implies that every container, and some of them can be just a few bytes, needs to track its source. It adds memory usage, computational overhead, complexity overhead, etc.
link

I argue that we do not need to track allocators per container, only per bitmap. As far as I can see, there is no way for a container to move from one bitmap to another with the operations we currently expose. If COW is enabled, containers can be shared between bitmaps, which could lead to issues if they are shared between bitmaps with different allocators. However, COW is already documented to be dangerous to use inconsistently. I believe we can just document that if when using custom allocators in combination with COW, it's up to the user to ensure that all allocators match for all bitmaps involved in operations.

Other than COW, containers are created for a specific bitmap, and never leave it, and are deallocated in the context of working with the same bitmap, so they don't need to track their allocator separately.

Performance
We already have an indirect function call for every allocation operation after #358.

I envision something like the following (in an internal header):

typedef struct roaring_allocator_context_s {
    const roaring_allocator_t* alloc;
    void* alloc_data;
} roaring_allocator_context_t;

#define ALLOCATOR_CTX_GLOBAL ((roaring_allocator_context_t){NULL, NULL})

static inline void* allocator_malloc(roaring_allocator_context_t alloc_ctx,
                                     size_t size) {
    return alloc_ctx.alloc ? alloc_ctx.alloc->malloc(size, alloc_ctx.alloc_data)
                           : roaring_malloc(size);
}

static inline roaring_allocator_context_t roaring_bitmap_get_allocator_internal(roaring_bitmap_t *r) { /* ... */ }

Most internal functions (which can allocate, even transitively) would need to be updated to take a roaring_allocator_context_t parameter.

I believe this will have a negligible performance impact, but would need to benchmark to verify.

Are you willing to contribute code or documentation toward this new feature?
Yes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions