Skip to content

Remove a branch from try_alloc_layout? #234

Open
@overlookmotel

Description

@overlookmotel

After reading fitzgen's (very interesting) blog post about the rationale for bumping downwards, I had one thought:

try_alloc_layout_fast has 2 branches:

bumpalo/src/lib.rs

Lines 1414 to 1442 in bb660a3

fn try_alloc_layout_fast(&self, layout: Layout) -> Option<NonNull<u8>> {
// We don't need to check for ZSTs here since they will automatically
// be handled properly: the pointer will be bumped by zero bytes,
// modulo alignment. This keeps the fast path optimized for non-ZSTs,
// which are much more common.
unsafe {
let footer = self.current_chunk_footer.get();
let footer = footer.as_ref();
let ptr = footer.ptr.get().as_ptr();
let start = footer.data.as_ptr();
debug_assert!(start <= ptr);
debug_assert!(ptr as *const u8 <= footer as *const _ as *const u8);
if (ptr as usize) < layout.size() {
return None;
}
let ptr = ptr.wrapping_sub(layout.size());
let aligned_ptr = round_mut_ptr_down_to(ptr, layout.align());
if aligned_ptr >= start {
let aligned_ptr = NonNull::new_unchecked(aligned_ptr);
footer.ptr.set(aligned_ptr);
Some(aligned_ptr)
} else {
None
}
}
}

The first branch (ptr as usize) < layout.size() is there purely to ensure that ptr.wrapping_sub(layout.size()) cannot wrap around. This rules out a possible mistake when evaluating the 2nd branch condition aligned_ptr >= start.

Bumpalo already has a method Bump::set_allocation_limit to limit the size of the Bump. I imagine that most users could impose a limit on the size of their Bumps. It'd be an uncommon use case for a bump allocator to be allocating massive slabs of memory, as they'd probably also be long-lived.

My thinking is this:

Taking example where size limit is 4 GiB minus 1 byte (i.e. size <= u32::MAX):

If the total size of the bump is constrained to 4 GiB, then no single allocation can be larger than 4 GiB. So layout.size() of a successful allocation is always a valid u32.

Constrain T in fn alloc<T>(&self, val: T) to only allow types where mem::size_of::<T>() <= u32::MAX.

When Bump allocates a chunk from global allocator, request a chunk of 4 GiB size. If my understanding is correct, this will only consume 4 GiB of virtual memory, not physical memory (though I may be wrong about that, in which case my whole theory here collapses!)

Check the start pointer for that chunk satisfies start_ptr as usize > u32::MAX as usize. In unlikely event that it doesn't:

  • Allocate another 4 GiB chunk.
  • Because allocations can't overlap, the pointer to the 2nd allocation is guaranteed to be > u32::MAX.
  • Free the 1st allocation, and use the 2nd for the chunk.

Either way, we now have a guarantee that start_ptr > u32::MAX.

Bump::alloc<T> can use a specialized version of alloc_layout where layout.size() is statically constrained to be <= u32::MAX.

Combining these 2 guarantees means that (ptr as usize) < layout.size() can never be true, and that branch can be removed. ptr.wrapping_sub(layout.size()) can never wrap.

NB: A size check would still be required when allocating &[T], as size is not knowable statically. But nonetheless, at least making Bump::alloc a bit faster would probably be a worthwhile gain.

NB 2: Some of the above is a little approximate (maybe I'm conflating 4 GiB and 4 GiB - 1 in some places), but hopefully the general idea is clear enough.

Do you think this would work? And if so, would it be a worthwhile optimization?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions