Proposal: Remove the custom memory allocators and replace with an arena

I've got a partial patch in #309 to replace Gumbo's custom memory allocators with an arena stored in the GumboOutput structure.  Seeking input on situations where this may have unintended consequences.
## Concrete proposal
1. Eliminate the allocator/deallocator/userdata fields from GumboOptions.
2. Add an 'arena' member to GumboOutput.  This is an opaque struct that is destroyed with gumbo_output_destroy, and contains all memory used by the library.
3. Internally, the arena is a linked list of chunks.  They're currently sized at 240K to fit entirely within the L2 cache of recent Intel processors.  Allocation consists of a pointer bump, with a new chunk being allocated if there is no space in the current one.  Deallocation is a no-op, with the full list of chunks being destroyed by gumbo_output_destroy.
4. Optionally, we can now remove the GumboOptions argument to gumbo_output_destroy.
## Current workaround

Currently Gumbo allows the usage of custom allocators (including arenas), but defaults to a simple wrapper around malloc.  It's possible and quite easy to implement an arena with Gumbo, but design trade-offs in _how_ Gumbo allocates memory differ based on what memory allocator is used.  As a result, users of the default allocator get reduced performance to support the possibility of an arena, while users of an arena get increased memory traffic to support the system malloc default.  It may be simpler for users to remove the choice and instead provide good performance out-of-the-box with an arena.
## Benefits
1. Measured performance increase of roughly 20% in CPU time, out of the box.
2. Smaller API surface for users to understand.
3. No possibility of memory leaks, and a much reduced possibility of dangling pointers.
4. Allows us to add proper out-of-memory handling, something I've long wanted.  Right now, if malloc fails in gumbo_parser_allocate, the parser derefs a null pointer (or worse, tramples memory).  This is because a.) malloc never fails in modern Linux distributions and b.) cleaning up the parser in a way that doesn't leak memory is highly non-trivial, since the parse tree may have been in an inconsistent state when the bad allocation was made.  With patch #309, when an arena malloc fails it longjmps back to gumbo_parse, which returns the parse tree that it's parsed up to that point and sets an out_of_memory flag in GumboOutput, and the partial parse tree will be freed when the arena is deallocated in gumbo_destroy_output.
5. Opens up the possibility of some API changes that were previously deemed too complicated for memory-management reasons.  For example, gumbo_normalized_tagname can return a buffer allocated inside the arena that actually normalizes the case (it doesn't right now, because that would require a fresh buffer that must be deallocated), without the client needing to delete it itself.  gumbo_destroy_output would no longer need to carry the GumboOptions along.
## Drawbacks
1. Larger memory usage.  A typical HTML document under Gumbo 0.10.0 takes roughly 5K of memory  per 1K of document length to parse.  The arena implementation roughly doubles that.  In absolute numbers, under the arena the median document takes about 720K of memory, with the 95th percentile at 2.4M.
2. Makes mutability much more difficult.  Arenas are not good choices for mutable documents, where there is repeated allocation & freeing.  This proposal is incompatible with #311, and would largely lock Gumbo in to a use-case where you either query the parse tree and pull out the information you want, or you wrap it in a data structure of your choosing and throw it away.
3. Backwards incompatible; it modifies the GumboOptions structure.
4. Loses the ability to use custom allocators.  The main use-case for them in my own programming had always been arenas, but I wonder if anyone is using them to eg. place Gumbo data on a garbage-collected heap?  Is anyone using them at all?
5. How would this affect embedded systems, where RAM may be tighter?  Is anyone using Gumbo in an embedded system?
## Compromise solutions
1. We could allow tweaking of the arena size as part of GumboOptions, hopefully eliminating some memory pressure if you need to parse small HTML documents on an embedded system.
2. We could keep the custom allocator machinery, but change the default to use an arena allocator and special-case some codepaths for it.  This would keep most of the performance benefits, flexibility, and backwards compatibility, but we lose advantages 2-5, increase the internal complexity significantly, and are still faced with performance tradeoffs between arena vs. system malloc optimizations.

Comment with a +1 or -1, or any additional comments or considerations.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: Remove the custom memory allocators and replace with an arena #312

Concrete proposal

Current workaround

Benefits

Drawbacks

Compromise solutions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: Remove the custom memory allocators and replace with an arena #312

Description

Concrete proposal

Current workaround

Benefits

Drawbacks

Compromise solutions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions