Skip to content

Remove a branch from try_alloc_layout? #234

Open
@overlookmotel

Description

@overlookmotel

After reading fitzgen's (very interesting) blog post about the rationale for bumping downwards, I had one thought:

try_alloc_layout_fast has 2 branches:

bumpalo/src/lib.rs

Lines 1414 to 1442 in bb660a3

fn try_alloc_layout_fast(&self, layout: Layout) -> Option<NonNull<u8>> {
// We don't need to check for ZSTs here since they will automatically
// be handled properly: the pointer will be bumped by zero bytes,
// modulo alignment. This keeps the fast path optimized for non-ZSTs,
// which are much more common.
unsafe {
let footer = self.current_chunk_footer.get();
let footer = footer.as_ref();
let ptr = footer.ptr.get().as_ptr();
let start = footer.data.as_ptr();
debug_assert!(start <= ptr);
debug_assert!(ptr as *const u8 <= footer as *const _ as *const u8);
if (ptr as usize) < layout.size() {
return None;
}
let ptr = ptr.wrapping_sub(layout.size());
let aligned_ptr = round_mut_ptr_down_to(ptr, layout.align());
if aligned_ptr >= start {
let aligned_ptr = NonNull::new_unchecked(aligned_ptr);
footer.ptr.set(aligned_ptr);
Some(aligned_ptr)
} else {
None
}
}
}

The first branch (ptr as usize) < layout.size() is there purely to ensure that ptr.wrapping_sub(layout.size()) cannot wrap around. This rules out a possible mistake when evaluating the 2nd branch condition aligned_ptr >= start.

Bumpalo already has a method Bump::set_allocation_limit to limit the size of the Bump. I imagine that most users could impose a limit on the size of their Bumps. It'd be an uncommon use case for a bump allocator to be allocating massive slabs of memory, as they'd probably also be long-lived.

My thinking is this:

Taking example where size limit is 4 GiB minus 1 byte (i.e. size <= u32::MAX):

If the total size of the bump is constrained to 4 GiB, then no single allocation can be larger than 4 GiB. So layout.size() of a successful allocation is always a valid u32.

Constrain T in fn alloc<T>(&self, val: T) to only allow types where mem::size_of::<T>() <= u32::MAX.

When Bump allocates a chunk from global allocator, request a chunk of 4 GiB size. If my understanding is correct, this will only consume 4 GiB of virtual memory, not physical memory (though I may be wrong about that, in which case my whole theory here collapses!)

Check the start pointer for that chunk satisfies start_ptr as usize > u32::MAX as usize. In unlikely event that it doesn't:

  • Allocate another 4 GiB chunk.
  • Because allocations can't overlap, the pointer to the 2nd allocation is guaranteed to be > u32::MAX.
  • Free the 1st allocation, and use the 2nd for the chunk.

Either way, we now have a guarantee that start_ptr > u32::MAX.

Bump::alloc<T> can use a specialized version of alloc_layout where layout.size() is statically constrained to be <= u32::MAX.

Combining these 2 guarantees means that (ptr as usize) < layout.size() can never be true, and that branch can be removed. ptr.wrapping_sub(layout.size()) can never wrap.

NB: A size check would still be required when allocating &[T], as size is not knowable statically. But nonetheless, at least making Bump::alloc a bit faster would probably be a worthwhile gain.

NB 2: Some of the above is a little approximate (maybe I'm conflating 4 GiB and 4 GiB - 1 in some places), but hopefully the general idea is clear enough.

Do you think this would work? And if so, would it be a worthwhile optimization?

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions