Description
After reading fitzgen's (very interesting) blog post about the rationale for bumping downwards, I had one thought:
try_alloc_layout_fast
has 2 branches:
Lines 1414 to 1442 in bb660a3
The first branch (ptr as usize) < layout.size()
is there purely to ensure that ptr.wrapping_sub(layout.size())
cannot wrap around. This rules out a possible mistake when evaluating the 2nd branch condition aligned_ptr >= start
.
Bumpalo already has a method Bump::set_allocation_limit to limit the size of the Bump
. I imagine that most users could impose a limit on the size of their Bump
s. It'd be an uncommon use case for a bump allocator to be allocating massive slabs of memory, as they'd probably also be long-lived.
My thinking is this:
Taking example where size limit is 4 GiB minus 1 byte (i.e. size <= u32::MAX
):
If the total size of the bump is constrained to 4 GiB, then no single allocation can be larger than 4 GiB. So layout.size()
of a successful allocation is always a valid u32
.
Constrain T
in fn alloc<T>(&self, val: T)
to only allow types where mem::size_of::<T>() <= u32::MAX
.
When Bump
allocates a chunk from global allocator, request a chunk of 4 GiB size. If my understanding is correct, this will only consume 4 GiB of virtual memory, not physical memory (though I may be wrong about that, in which case my whole theory here collapses!)
Check the start pointer for that chunk satisfies start_ptr as usize > u32::MAX as usize
. In unlikely event that it doesn't:
- Allocate another 4 GiB chunk.
- Because allocations can't overlap, the pointer to the 2nd allocation is guaranteed to be
> u32::MAX
. - Free the 1st allocation, and use the 2nd for the chunk.
Either way, we now have a guarantee that start_ptr > u32::MAX
.
Bump::alloc<T>
can use a specialized version of alloc_layout
where layout.size()
is statically constrained to be <= u32::MAX
.
Combining these 2 guarantees means that (ptr as usize) < layout.size()
can never be true, and that branch can be removed. ptr.wrapping_sub(layout.size())
can never wrap.
NB: A size check would still be required when allocating &[T]
, as size is not knowable statically. But nonetheless, at least making Bump::alloc
a bit faster would probably be a worthwhile gain.
NB 2: Some of the above is a little approximate (maybe I'm conflating 4 GiB and 4 GiB - 1 in some places), but hopefully the general idea is clear enough.
Do you think this would work? And if so, would it be a worthwhile optimization?
Activity