GoMem

Language: English | 中文

GoMem is a high-performance memory allocator library for Go, extracted from the Monibuca project.

Features

Multiple Allocation Strategies: Support for both single-tree and two-tree (AVL) allocation algorithms
Buddy Allocator: Optional buddy system for efficient memory pooling
Recyclable Memory: Memory recycling support with automatic cleanup
Scalable Allocator: Dynamically growing memory allocator
Memory Reader: Efficient multi-buffer reader with zero-copy operations

Build Tags

The library supports several build tags to customize behavior:

twotree: Use two-tree (AVL) implementation instead of single treap
enable_buddy: Enable buddy allocator for memory pooling
disable_rm: Disable recyclable memory features for reduced overhead
enable_mmap: Enable memory-mapped allocation for improved memory efficiency (Linux/macOS/Windows)
- Linux: Automatically enables Transparent Huge Pages (THP) support, using 2MB huge pages instead of 4KB pages for significantly reduced TLB misses and improved memory access performance

Installation

go get github.com/langhuihui/gomem

Usage

Basic Memory Allocation

package main

import "github.com/langhuihui/gomem"

func main() {
    // Create a scalable memory allocator
    allocator := gomem.NewScalableMemoryAllocator(1024)
    
    // Allocate memory
    buf := allocator.Malloc(256)
    
    // Use the buffer...
    copy(buf, []byte("Hello, World!"))
    
    // Free the memory
    allocator.Free(buf)
}

Partial Memory Deallocation

package main

import "github.com/langhuihui/gomem"

func main() {
    // Create a scalable memory allocator
    allocator := gomem.NewScalableMemoryAllocator(1024)
    
    // Allocate a large block of memory
    buf := allocator.Malloc(1024)
    
    // Use different parts of the memory
    part1 := buf[0:256]    // First 256 bytes
    part2 := buf[256:512]  // Middle 256 bytes  
    part3 := buf[512:1024] // Last 512 bytes
    
    // Fill with data
    copy(part1, []byte("Part 1 data"))
    copy(part2, []byte("Part 2 data"))
    copy(part3, []byte("Part 3 data"))
    
    // Partial deallocation - can free parts of memory
    allocator.Free(part1)  // Free first 256 bytes
    allocator.Free(part2)  // Free middle 256 bytes
    
    // Continue using remaining memory
    copy(part3, []byte("Updated part 3"))
    
    // Finally free remaining memory
    allocator.Free(part3)
}

Recyclable Memory

// Create recyclable memory for batch operations
allocator := gomem.NewScalableMemoryAllocator(1024)
rm := gomem.NewRecyclableMemory(allocator)

// Allocate multiple buffers
buf1 := rm.NextN(128)
buf2 := rm.NextN(256)

// Use the buffers...
copy(buf1, []byte("Buffer 1"))
copy(buf2, []byte("Buffer 2"))

// Recycle all memory at once
rm.Recycle()

Memory Buffer Operations

// Create a memory buffer
mem := gomem.NewMemory([]byte{1, 2, 3, 4, 5})

// Add more data
mem.PushOne([]byte{6, 7, 8})

// Get total size and buffer count
fmt.Printf("Size: %d, Buffers: %d\n", mem.Size, mem.Count())

// Convert to bytes
data := mem.ToBytes()

Memory Reader

// Create a memory reader
reader := gomem.NewReadableBuffersFromBytes([]byte{1, 2, 3}, []byte{4, 5, 6})

// Read data
buf := make([]byte, 6)
n, err := reader.Read(buf)
// buf now contains [1, 2, 3, 4, 5, 6]

Concurrency Safety

⚠️ Important: Malloc and Free operations must be called from the same goroutine to avoid race conditions. For more elegant usage, consider using gotask, where you can allocate memory in the Start method and free it in the Dispose method.

// ❌ Wrong: Different goroutines
go func() {
    buf := allocator.Malloc(256)
    // ... use buffer
}()

go func() {
    allocator.Free(buf) // Race condition!
}()

// ✅ Correct: Same goroutine
buf := allocator.Malloc(256)
// ... use buffer
allocator.Free(buf)

// ✅ Elegant: Using gotask
type MyTask struct {
    allocator *gomem.ScalableMemoryAllocator
    buffer []byte
}

func (t *MyTask) Start() {
    t.allocator = gomem.NewScalableMemoryAllocator(1024)
    t.buffer = t.allocator.Malloc(256)
}

func (t *MyTask) Dispose() {
    t.allocator.Free(t.buffer)
}

Performance Considerations

Use enable_mmap build tag for dramatic performance improvements: 100-400x faster allocator creation, 99.98% less memory usage
Use enable_buddy build tag for better memory pooling in high-throughput scenarios
RecyclableMemory enabled is 53% faster than disabled version and uses less memory
Use disable_rm build tag only when you don't need memory management features (reduces complexity but sacrifices performance)
Single-tree allocator is significantly faster than two-tree allocator (77-86% faster for allocation operations)
Use twotree build tag only if you need faster find operations (100% faster than single-tree)

Benchmark Results

The following benchmark results were obtained on Apple M2 Pro (ARM64) with Go 1.23.0:

MMAP vs Default Implementation Performance Comparison

The MMAP implementation provides dramatic improvements in memory efficiency with minimal performance overhead:

Metric	Default	MMAP	Improvement
Overall Performance (geomean)	234.1 ns/op	94.21 ns/op	59.8% faster ⚡
Memory Usage (geomean)	-	-	86.6% reduction 💾
1MB Allocator Creation	80.5 µs 1,048,763 B	799 ns 216 B	100x faster 99.98% less memory 🚀
16MB Allocator Creation	317.2 µs 16,777,405 B	777 ns 216 B	408x faster 99.999% less memory 🚀
Individual Allocation (1KB)	13.25 ns/op	13.89 ns/op	4.8% slower
Memory Access (Write)	441 ns/op	458 ns/op	3.8% slower
Memory Access (Read)	320 ns/op	333 ns/op	4.2% slower

Key Findings:

Allocator Creation: MMAP is 100-408x faster with 99.98-99.999% less memory usage
Memory Efficiency: Uses only 216 bytes for metadata vs allocating full buffer upfront
Allocation Operations: Only 3-6% slower (< 1 nanosecond overhead) - negligible in most use cases
Virtual Memory: MMAP reserves address space without immediate physical memory allocation (lazy allocation)

When to Use MMAP:

✅ Creating multiple or large allocators
✅ Memory efficiency is critical
✅ Creating/destroying allocators frequently
✅ Working with sparse data (not all memory used immediately)
✅ Need to reserve large address spaces

When to Use Default:

⚠️ Every nanosecond matters in allocation operations (HFT, etc.)
⚠️ All allocated memory will be used immediately
⚠️ Running on systems without efficient mmap support

Enable MMAP:

go build -tags=enable_mmap

Linux THP Support: When using enable_mmap on Linux, Transparent Huge Pages (THP) are automatically enabled:

Uses 2MB huge pages instead of 4KB small pages (on x86_64 architecture)
Significantly reduces TLB (Translation Lookaside Buffer) misses
Improves performance for large memory access patterns
Implemented via madvise(MADV_HUGEPAGE) system call
Gracefully falls back to regular pages if THP is not supported by the system

Single-Tree vs Two-Tree Allocator Performance Comparison

Operation Type	Single-Tree (ns/op)	Two-Tree (ns/op)	Performance Difference	Winner
Basic Allocation	12.33	22.71	84% faster	Single-Tree
Small Allocation (64B)	12.32	22.60	84% faster	Single-Tree
Large Allocation (8KB)	12.14	22.61	86% faster	Single-Tree
Sequential Allocation	1961	3467	77% faster	Single-Tree
Random Allocation	12.47	23.02	85% faster	Single-Tree
Find Operation	3.03	1.51	100% faster	Two-Tree
GetFreeSize	3.94	4.27	8% faster	Single-Tree

Key Findings:

Single-tree allocator is 77-86% faster for memory allocation operations
Two-tree allocator is 100% faster for find operations only
Single-tree allocator is recommended for most use cases due to superior allocation performance

RecyclableMemory Performance Comparison (RM Enabled vs Disabled)

Operation Type	RM Enabled (ns/op)	RM Disabled (ns/op)	Performance Difference	Memory Usage
Basic Operations	335.2	511.9	53% faster	Enabled: 1536B/2 allocs, Disabled: 1788B/2 allocs
Multiple Allocations	-	1035.1	-	Disabled: 3875B/10 allocs
Clone Operations	-	53.7	-	Disabled: 240B/1 alloc

Key Findings:

RecyclableMemory enabled is 53% faster for basic operations
RM enabled uses less memory (1536B vs 1788B for basic operations)
RM enabled provides true memory management with recycling capabilities
RM disabled uses simple make([]byte, size) without memory pooling

Memory Allocator Performance (Single-Tree)

Benchmark	Operations/sec	Time/op
Allocate	96,758,520	15.08 ns
AllocateSmall	98,864,434	12.49 ns
AllocateLarge	100,000,000	12.65 ns
SequentialAlloc	1,321,965	942.2 ns
RandomAlloc	96,241,566	12.79 ns
GetFreeSize	303,367,089	3.934 ns

Memory Operations Performance

Benchmark	Operations/sec	Time/op	Memory/op	Allocs/op
PushOne	31,982,593	35.05 ns	143 B	0
Push	17,666,751	70.40 ns	259 B	0
ToBytes	119,496	11,806 ns	106,496 B	1
CopyTo	417,379	2,905 ns	0 B	0
Append	979,598	1,859 ns	7,319 B	0
Count	1,000,000,000	0.3209 ns	0 B	0
Range	32,809,593	36.08 ns	0 B	0

Memory Reader Performance

Benchmark	Operations/sec	Time/op	Memory/op	Allocs/op
Read	10,355,643	112.4 ns	112 B	2
ReadByte	536,228	2,235 ns	56 B	2
ReadBytes	2,556,602	608.7 ns	1,080 B	18
ReadBE	408,663	3,587 ns	56 B	2
Skip	8,762,934	125.8 ns	56 B	2
Range	15,608,808	70.99 ns	80 B	2
RangeN	20,101,638	79.09 ns	80 B	2
LEB128Unmarshal	356,560	3,052 ns	56 B	2

Buddy Allocator Performance

Benchmark	Operations/sec	Time/op	Memory/op
Alloc	4,017,826	388.2 ns	0 B
AllocSmall	3,092,535	410.7 ns	0 B
AllocLarge	3,723,950	276.4 ns	0 B
SequentialAlloc	62,786	17,997 ns	0 B
RandomAlloc	3,249,220	357.8 ns	0 B
Pool	27,800	56,846 ns	196,139 B
NonPowerOf2	3,167,425	317.8 ns	0 B

ScalableMemoryAllocator Performance

Benchmark	Operations/sec	Time/op	Memory/op	Allocs/op
Basic Operations
Malloc	92,943,320	13.22 ns	0 B	0
MallocSmall (64B)	73,196,394	16.62 ns	0 B	0
MallocLarge (8KB)	10,000	127,506 ns	4,191,139 B	5
Memory Borrowing
Borrow	221,620,256	5.425 ns	0 B	0
BorrowSmall (64B)	90,733,239	13.38 ns	0 B	0
BorrowLarge (8KB)	80,812,390	12.58 ns	0 B	0
Allocation Patterns
SequentialAlloc	789,878	1,541 ns	0 B	0
RandomAlloc	32,514	38,625 ns	1,197,044 B	1
RandomBorrow	144,988,590	8.261 ns	0 B	0
MixedPattern	131,418,630	9.210 ns	0 B	0
Advanced Operations
GetStats	1,000,000,000	0.3013 ns	0 B	0
FreeRest	52,918,608	23.25 ns	0 B	0
Scaling	10,000	107,642 ns	3,351,399 B	4
Concurrent	2,332,717	519.4 ns	0 B	0
MemoryPressure	10,000	145,329 ns	4,193,342 B	7

RecyclableMemory Performance

Benchmark	Operations/sec	Time/op
NextN	31,148,637	32.11 ns
BatchRecycle	3,902,038	312.4 ns
WithRecycleIndexes	3,706,173	331.5 ns

Performance Summary

Single-Tree Allocator: Extremely fast allocation/deallocation with ~12ns per operation and zero memory allocations
Two-Tree Allocator: Slower allocation (~23ns per operation) but faster find operations (~1.5ns vs ~3ns)
ScalableMemoryAllocator: High-performance scalable allocator with dynamic growth
- Malloc operations: ~13-17ns per operation with zero memory allocations
- Borrow operations: Extremely fast ~5-13ns per operation (borrowing is 2-3x faster than malloc)
- Memory efficiency: Zero garbage collection pressure for small/medium allocations
- Scaling capability: Automatically grows to accommodate larger allocations
RecyclableMemory: Efficient batch memory management
- NextN operations: ~32ns per operation with zero memory allocations
- Batch recycling: ~312ns for recycling 10 buffers at once
- Memory efficiency: 53% faster than disabled version with better memory efficiency
Memory Operations: Efficient buffer management with minimal overhead
Memory Reader: High-performance reading with zero-copy operations
Buddy Allocator: Fast power-of-2 allocation with pool support for reduced GC pressure

Key Performance Insights:

Borrow is fastest: Borrow operations (5-13ns) are 2-3x faster than malloc operations (13-17ns)
Zero GC pressure: Most operations produce zero memory allocations
Excellent scaling: ScalableMemoryAllocator handles dynamic growth efficiently
Batch efficiency: RecyclableMemory provides efficient batch operations

Recommendations:

Use enable_mmap tag for most applications to gain 60% performance improvement and 87% memory reduction
Use ScalableMemoryAllocator for applications requiring dynamic memory growth
Prefer Borrow over Malloc when possible for maximum performance
Use RecyclableMemory for batch operations requiring multiple allocations
Use single-tree allocator (default) for most applications due to superior allocation performance
Keep RecyclableMemory enabled (default) for better performance and memory efficiency
Only use two-tree allocator if find operations are critical and frequent
Only use disable_rm tag when you don't need memory management features

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

If you have any questions or need help, please open an issue on GitHub.

Star History

_{Built with ❤️ by the GoMem team}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
allocator.go		allocator.go
allocator_2tree.go		allocator_2tree.go
allocator_bench_test.go		allocator_bench_test.go
allocator_test.go		allocator_test.go
buddy.go		buddy.go
buddy_bench_test.go		buddy_bench_test.go
buddy_disable.go		buddy_disable.go
buddy_disable_rm.go		buddy_disable_rm.go
buddy_enable.go		buddy_enable.go
compare_bench_test.go		compare_bench_test.go
go.mod		go.mod
go.sum		go.sum
logo.png		logo.png
mem.go		mem.go
mem_reader.go		mem_reader.go
mem_test.go		mem_test.go
memory_bench_test.go		memory_bench_test.go
mmap_bench_test.go		mmap_bench_test.go
mmap_darwin.go		mmap_darwin.go
mmap_disable.go		mmap_disable.go
mmap_linux.go		mmap_linux.go
mmap_windows.go		mmap_windows.go
reader_bench_test.go		reader_bench_test.go
rm_disable.go		rm_disable.go
rm_enable.go		rm_enable.go
scalable_bench_test.go		scalable_bench_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GoMem

Features

Build Tags

Installation

Usage

Basic Memory Allocation

Partial Memory Deallocation

Recyclable Memory

Memory Buffer Operations

Memory Reader

Concurrency Safety

Performance Considerations

Benchmark Results

MMAP vs Default Implementation Performance Comparison

Single-Tree vs Two-Tree Allocator Performance Comparison

RecyclableMemory Performance Comparison (RM Enabled vs Disabled)

Memory Allocator Performance (Single-Tree)

Memory Operations Performance

Memory Reader Performance

Buddy Allocator Performance

ScalableMemoryAllocator Performance

RecyclableMemory Performance

Performance Summary

License

Contributing

Support

Star History

About

Uh oh!

Releases

Packages

Languages

License

langhuihui/gomem

Folders and files

Latest commit

History

Repository files navigation

GoMem

Features

Build Tags

Installation

Usage

Basic Memory Allocation

Partial Memory Deallocation

Recyclable Memory

Memory Buffer Operations

Memory Reader

Concurrency Safety

Performance Considerations

Benchmark Results

MMAP vs Default Implementation Performance Comparison

Single-Tree vs Two-Tree Allocator Performance Comparison

RecyclableMemory Performance Comparison (RM Enabled vs Disabled)

Memory Allocator Performance (Single-Tree)

Memory Operations Performance

Memory Reader Performance

Buddy Allocator Performance

ScalableMemoryAllocator Performance

RecyclableMemory Performance

Performance Summary

License

Contributing

Support

Star History

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages