Skip to content

Commit b55d218

Browse files
authored
Merge to master for 1.4.2
2 parents 21c3f8b + 7990f8e commit b55d218

File tree

13 files changed

+576
-241
lines changed

13 files changed

+576
-241
lines changed

BENCHMARKS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Benchmarks
2-
Contained in a parallell repository is a benchmark utility that performs interleaved allocations (both aligned to 8 or 16 bytes, and unaligned) and deallocations (both in-thread and cross-thread) in multiple threads. It measures number of memory operations performed per CPU second, as well as memory overhead by comparing the virtual memory mapped with the number of bytes requested in allocation calls. The setup of number of thread, cross-thread deallocation rate and allocation size limits is configured by command line arguments.
2+
Contained in a parallel repository is a benchmark utility that performs interleaved allocations (both aligned to 8 or 16 bytes, and unaligned) and deallocations (both in-thread and cross-thread) in multiple threads. It measures number of memory operations performed per CPU second, as well as memory overhead by comparing the virtual memory mapped with the number of bytes requested in allocation calls. The setup of number of thread, cross-thread deallocation rate and allocation size limits is configured by command line arguments.
33

44
https://github.com/mjansson/rpmalloc-benchmark
55

CHANGELOG

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,22 @@
1+
1.4.2
2+
3+
Fixed an issue where calling _exit might hang the main thread cleanup in rpmalloc if another
4+
worker thread was terminated while holding exclusive access to the global cache.
5+
6+
Improved caches to prioritize main spans in a chunk to avoid leaving main spans mapped due to
7+
remaining subspans in caches.
8+
9+
Improve cache reuse by allowing large blocks to use caches from slightly larger cache classes.
10+
11+
Fixed an issue where thread heap statistics would go out of sync when a free span was deferred
12+
to another thread heap
13+
14+
API breaking change - added flag to rpmalloc_thread_finalize to avoid releasing thread caches.
15+
Pass nonzero value to retain old behaviour of releasing thread caches to global cache.
16+
17+
Add option to config to set a custom error callback for assert failures (if ENABLE_ASSERT)
18+
19+
120
1.4.1
221

322
Dual license as both released to public domain or under MIT license

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ Configuration of the thread and global caches can be important depending on your
3333

3434
# Required functions
3535

36-
Before calling any other function in the API, you __MUST__ call the initization function, either __rpmalloc_initialize__ or __pmalloc_initialize_config__, or you will get undefined behaviour when calling other rpmalloc entry point.
36+
Before calling any other function in the API, you __MUST__ call the initialization function, either __rpmalloc_initialize__ or __pmalloc_initialize_config__, or you will get undefined behaviour when calling other rpmalloc entry point.
3737

3838
Before terminating your use of the allocator, you __SHOULD__ call __rpmalloc_finalize__ in order to release caches and unmap virtual memory, as well as prepare the allocator for global scope cleanup at process exit or dynamic library unload depending on your use case.
3939

@@ -104,7 +104,7 @@ The allocator is based on a fixed but configurable page alignment (defaults to 6
104104

105105
Memory blocks are divided into three categories. For 64KiB span size/alignment the small blocks are [16, 1024] bytes, medium blocks (1024, 32256] bytes, and large blocks (32256, 2097120] bytes. The three categories are further divided in size classes. If the span size is changed, the small block classes remain but medium blocks go from (1024, span size] bytes.
106106

107-
Small blocks have a size class granularity of 16 bytes each in 64 buckets. Medium blocks have a granularity of 512 bytes, 61 buckets (default). Large blocks have a the same granularity as the configured span size (default 64KiB). All allocations are fitted to these size class boundaries (an allocation of 36 bytes will allocate a block of 48 bytes). Each small and medium size class has an associated span (meaning a contiguous set of memory pages) configuration describing how many pages the size class will allocate each time the cache is empty and a new allocation is requested.
107+
Small blocks have a size class granularity of 16 bytes each in 64 buckets. Medium blocks have a granularity of 512 bytes, 61 buckets (default). Large blocks have the same granularity as the configured span size (default 64KiB). All allocations are fitted to these size class boundaries (an allocation of 36 bytes will allocate a block of 48 bytes). Each small and medium size class has an associated span (meaning a contiguous set of memory pages) configuration describing how many pages the size class will allocate each time the cache is empty and a new allocation is requested.
108108

109109
Spans for small and medium blocks are cached in four levels to avoid calls to map/unmap memory pages. The first level is a per thread single active span for each size class. The second level is a per thread list of partially free spans for each size class. The third level is a per thread list of free spans. The fourth level is a global list of free spans.
110110

@@ -113,7 +113,7 @@ Each span for a small and medium size class keeps track of how many blocks are a
113113
Large blocks, or super spans, are cached in two levels. The first level is a per thread list of free super spans. The second level is a global list of free super spans.
114114

115115
# Memory mapping
116-
By default the allocator uses OS APIs to map virtual memory pages as needed, either `VirtualAlloc` on Windows or `mmap` on POSIX systems. If you want to use your own custom memory mapping provider you can use __rpmalloc_initialize_config__ and pass function pointers to map and unmap virtual memory. These function should reserve and free the requested number of bytes.
116+
By default the allocator uses OS APIs to map virtual memory pages as needed, either `VirtualAlloc` on Windows or `mmap` on POSIX systems. If you want to use your own custom memory mapping provider you can use __rpmalloc_initialize_config__ and pass function pointers to map and unmap virtual memory. These function should reserve and free the requested number of bytes.
117117

118118
The returned memory address from the memory map function MUST be aligned to the memory page size and the memory span size (which ever is larger), both of which is configurable. Either provide the page and span sizes during initialization using __rpmalloc_initialize_config__, or use __rpmalloc_config__ to find the required alignment which is equal to the maximum of page and span size. The span size MUST be a power of two in [4096, 262144] range, and be a multiple or divisor of the memory page size.
119119

@@ -128,7 +128,7 @@ Super spans (spans a multiple > 1 of the span size) can be subdivided into small
128128

129129
A span that is a subspan of a larger super span can be individually decommitted to reduce physical memory pressure when the span is evicted from caches and scheduled to be unmapped. The entire original super span will keep track of the subspans it is broken up into, and when the entire range is decommitted tha super span will be unmapped. This allows platforms like Windows that require the entire virtual memory range that was mapped in a call to VirtualAlloc to be unmapped in one call to VirtualFree, while still decommitting individual pages in subspans (if the page size is smaller than the span size).
130130

131-
If you use a custom memory map/unmap function you need to take this into account by looking at the `release` parameter given to the `memory_unmap` function. It is set to 0 for decommitting invididual pages and the total super span byte size for finally releasing the entire super span memory range.
131+
If you use a custom memory map/unmap function you need to take this into account by looking at the `release` parameter given to the `memory_unmap` function. It is set to 0 for decommitting individual pages and the total super span byte size for finally releasing the entire super span memory range.
132132

133133
# Memory fragmentation
134134
There is no memory fragmentation by the allocator in the sense that it will not leave unallocated and unusable "holes" in the memory pages by calls to allocate and free blocks of different sizes. This is due to the fact that the memory pages allocated for each size class is split up in perfectly aligned blocks which are not reused for a request of a different size. The block freed by a call to `rpfree` will always be immediately available for an allocation request within the same size class.

build/ninja/clang.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ def initialize(self, project, archs, configs, includepaths, dependlibs, libpaths
3838
self.cxxcmd = '$toolchain$cxx -MMD -MT $out -MF $out.d $includepaths $moreincludepaths $cxxflags $carchflags $cconfigflags $cmoreflags $cxxenvflags -c $in -o $out'
3939
self.ccdeps = 'gcc'
4040
self.ccdepfile = '$out.d'
41-
self.arcmd = self.rmcmd('$out') + ' && $toolchain$ar crsD $ararchflags $arflags $arenvflags $out $in'
41+
self.arcmd = self.rmcmd('$out') + ' && $toolchain$ar crs $ararchflags $arflags $arenvflags $out $in'
4242
if self.target.is_windows():
4343
self.linkcmd = '$toolchain$link $libpaths $configlibpaths $linkflags $linkarchflags $linkconfigflags $linkenvflags /debug /nologo /subsystem:console /dynamicbase /nxcompat /manifest /manifestuac:\"level=\'asInvoker\' uiAccess=\'false\'\" /tlbid:1 /pdb:$pdbpath /out:$out $in $libs $archlibs $oslibs $frameworks'
4444
self.dllcmd = self.linkcmd + ' /dll'
@@ -52,7 +52,7 @@ def initialize(self, project, archs, configs, includepaths, dependlibs, libpaths
5252
'-fno-trapping-math', '-ffast-math']
5353
self.cwarnflags = ['-W', '-Werror', '-pedantic', '-Wall', '-Weverything',
5454
'-Wno-c++98-compat', '-Wno-padded', '-Wno-documentation-unknown-command',
55-
'-Wno-implicit-fallthrough', '-Wno-static-in-inline', '-Wno-reserved-id-macro']
55+
'-Wno-implicit-fallthrough', '-Wno-static-in-inline', '-Wno-reserved-id-macro', '-Wno-disabled-macro-expansion']
5656
self.cmoreflags = []
5757
self.mflags = []
5858
self.arflags = []
@@ -76,8 +76,14 @@ def initialize(self, project, archs, configs, includepaths, dependlibs, libpaths
7676
self.oslibs += ['m']
7777
if self.target.is_linux() or self.target.is_raspberrypi():
7878
self.oslibs += ['dl']
79+
if self.target.is_raspberrypi():
80+
self.linkflags += ['-latomic']
7981
if self.target.is_bsd():
8082
self.oslibs += ['execinfo']
83+
if self.target.is_haiku():
84+
self.cflags += ['-D_GNU_SOURCE=1']
85+
self.linkflags += ['-lpthread']
86+
self.oslibs += ['m']
8187
if not self.target.is_windows():
8288
self.linkflags += ['-fomit-frame-pointer']
8389

@@ -391,7 +397,7 @@ def make_linkconfigflags(self, config, targettype, variables):
391397
if targettype == 'sharedlib':
392398
flags += ['-shared', '-fPIC']
393399
if config != 'debug':
394-
if targettype == 'bin' or targettype == 'sharedlib':
400+
if (targettype == 'bin' or targettype == 'sharedlib') and self.use_lto():
395401
flags += ['-flto']
396402
return flags
397403

build/ninja/gcc.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ def initialize(self, project, archs, configs, includepaths, dependlibs, libpaths
2424
self.cxxcmd = '$toolchain$cxx -MMD -MT $out -MF $out.d $includepaths $moreincludepaths $cxxflags $carchflags $cconfigflags $cmoreflags $cxxenvflags -c $in -o $out'
2525
self.ccdeps = 'gcc'
2626
self.ccdepfile = '$out.d'
27-
self.arcmd = self.rmcmd('$out') + ' && $toolchain$ar crsD $ararchflags $arflags $arenvflags $out $in'
27+
self.arcmd = self.rmcmd('$out') + ' && $toolchain$ar crs $ararchflags $arflags $arenvflags $out $in'
2828
self.linkcmd = '$toolchain$link $libpaths $configlibpaths $linkflags $linkarchflags $linkconfigflags $linkenvflags -o $out $in $libs $archlibs $oslibs'
2929

3030
#Base flags
@@ -54,8 +54,13 @@ def initialize(self, project, archs, configs, includepaths, dependlibs, libpaths
5454
self.linkflags += ['-pthread']
5555
if self.target.is_linux() or self.target.is_raspberrypi():
5656
self.oslibs += ['dl']
57+
if self.target.is_raspberrypi():
58+
self.linkflags += ['-latomic']
5759
if self.target.is_bsd():
5860
self.oslibs += ['execinfo']
61+
if self.target.is_haiku():
62+
self.cflags += ['-D_GNU_SOURCE=1']
63+
self.linkflags += ['-lpthread']
5964

6065
self.includepaths = self.prefix_includepaths((includepaths or []) + ['.'])
6166

build/ninja/generator.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,9 @@ def __init__(self, project, includepaths = [], dependlibs = [], libpaths = [], v
4949
parser.add_argument('--updatebuild', action='store_true',
5050
help = 'Update submodule build scripts',
5151
default = '')
52+
parser.add_argument('--lto', action='store_true',
53+
help = 'Build with Link Time Optimization',
54+
default = False)
5255
options = parser.parse_args()
5356

5457
self.project = project
@@ -91,6 +94,8 @@ def __init__(self, project, includepaths = [], dependlibs = [], libpaths = [], v
9194
variables['monolithic'] = True
9295
if options.coverage:
9396
variables['coverage'] = True
97+
if options.lto:
98+
variables['lto'] = True
9499
if self.subninja != '':
95100
variables['internal_deps'] = True
96101

build/ninja/platform.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
import sys
66

77
def supported_platforms():
8-
return [ 'windows', 'linux', 'macos', 'bsd', 'ios', 'android', 'raspberrypi', 'tizen', 'sunos' ]
8+
return [ 'windows', 'linux', 'macos', 'bsd', 'ios', 'android', 'raspberrypi', 'tizen', 'sunos', 'haiku' ]
99

1010
class Platform(object):
1111
def __init__(self, platform):
@@ -20,7 +20,7 @@ def __init__(self, platform):
2020
self.platform = 'macos'
2121
elif self.platform.startswith('win'):
2222
self.platform = 'windows'
23-
elif 'bsd' in self.platform:
23+
elif 'bsd' in self.platform or self.platform.startswith('dragonfly'):
2424
self.platform = 'bsd'
2525
elif self.platform.startswith('ios'):
2626
self.platform = 'ios'
@@ -32,6 +32,8 @@ def __init__(self, platform):
3232
self.platform = 'tizen'
3333
elif self.platform.startswith('sunos'):
3434
self.platform = 'sunos'
35+
elif self.platform.startswith('haiku'):
36+
self.platform = 'haiku'
3537

3638
def platform(self):
3739
return self.platform
@@ -63,5 +65,8 @@ def is_tizen(self):
6365
def is_sunos(self):
6466
return self.platform == 'sunos'
6567

68+
def is_haiku(self):
69+
return self.platform == 'haiku'
70+
6671
def get(self):
6772
return self.platform

build/ninja/toolchain.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ def __init__(self, host, target, toolchain):
5454
#Set default values
5555
self.build_monolithic = False
5656
self.build_coverage = False
57+
self.build_lto = False
5758
self.support_lua = False
5859
self.internal_deps = False
5960
self.python = 'python'
@@ -132,7 +133,7 @@ def initialize_archs(self, archs):
132133
def initialize_default_archs(self):
133134
if self.target.is_windows():
134135
self.archs = ['x86-64']
135-
elif self.target.is_linux() or self.target.is_bsd() or self.target.is_sunos():
136+
elif self.target.is_linux() or self.target.is_bsd() or self.target.is_sunos() or self.target.is_haiku():
136137
localarch = subprocess.check_output(['uname', '-m']).decode().strip()
137138
if localarch == 'x86_64' or localarch == 'amd64':
138139
self.archs = ['x86-64']
@@ -208,6 +209,8 @@ def parse_default_variables(self, variables):
208209
self.build_monolithic = get_boolean_flag(val)
209210
elif key == 'coverage':
210211
self.build_coverage = get_boolean_flag(val)
212+
elif key == 'lto':
213+
self.build_lto = get_boolean_flag(val)
211214
elif key == 'support_lua':
212215
self.support_lua = get_boolean_flag(val)
213216
elif key == 'internal_deps':
@@ -234,6 +237,8 @@ def parse_prefs(self, prefs):
234237
self.build_monolithic = get_boolean_flag(prefs['monolithic'])
235238
if 'coverage' in prefs:
236239
self.build_coverage = get_boolean_flag( prefs['coverage'] )
240+
if 'lto' in prefs:
241+
self.build_lto = get_boolean_flag( prefs['lto'] )
237242
if 'support_lua' in prefs:
238243
self.support_lua = get_boolean_flag(prefs['support_lua'])
239244
if 'python' in prefs:
@@ -258,6 +263,9 @@ def is_monolithic(self):
258263
def use_coverage(self):
259264
return self.build_coverage
260265

266+
def use_lto(self):
267+
return self.build_lto
268+
261269
def write_variables(self, writer):
262270
writer.variable('buildpath', self.buildpath)
263271
writer.variable('target', self.target.platform)

rpmalloc/malloc.c

Lines changed: 33 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -292,26 +292,55 @@ DllMain(HINSTANCE instance, DWORD reason, LPVOID reserved) {
292292
else if (reason == DLL_THREAD_ATTACH)
293293
rpmalloc_thread_initialize();
294294
else if (reason == DLL_THREAD_DETACH)
295-
rpmalloc_thread_finalize();
295+
rpmalloc_thread_finalize(1);
296296
return TRUE;
297297
}
298298

299+
//end BUILD_DYNAMIC_LINK
300+
#else
301+
302+
extern void
303+
_global_rpmalloc_init(void) {
304+
rpmalloc_set_main_thread();
305+
rpmalloc_initialize();
306+
}
307+
308+
#if defined(__clang__) || defined(__GNUC__)
309+
310+
static void __attribute__((constructor))
311+
initializer(void) {
312+
_global_rpmalloc_init();
313+
}
314+
315+
#elif defined(_MSC_VER)
316+
317+
#pragma section(".CRT$XIB",read)
318+
__declspec(allocate(".CRT$XIB")) void (*_rpmalloc_module_init)(void) = _global_rpmalloc_init;
319+
#pragma comment(linker, "/include:_rpmalloc_module_init")
320+
299321
#endif
300322

323+
//end !BUILD_DYNAMIC_LINK
324+
#endif
325+
301326
#else
302327

303328
#include <pthread.h>
304329
#include <stdlib.h>
305330
#include <stdint.h>
306331
#include <unistd.h>
307332

333+
extern void
334+
rpmalloc_set_main_thread(void);
335+
308336
static pthread_key_t destructor_key;
309337

310338
static void
311339
thread_destructor(void*);
312340

313341
static void __attribute__((constructor))
314342
initializer(void) {
343+
rpmalloc_set_main_thread();
315344
rpmalloc_initialize();
316345
pthread_key_create(&destructor_key, thread_destructor);
317346
}
@@ -340,7 +369,7 @@ thread_starter(void* argptr) {
340369
static void
341370
thread_destructor(void* value) {
342371
(void)sizeof(value);
343-
rpmalloc_thread_finalize();
372+
rpmalloc_thread_finalize(1);
344373
}
345374

346375
#ifdef __APPLE__
@@ -368,7 +397,8 @@ pthread_create(pthread_t* thread,
368397
const pthread_attr_t* attr,
369398
void* (*start_routine)(void*),
370399
void* arg) {
371-
#if defined(__linux__) || defined(__FreeBSD__) || defined(__OpenBSD__) || defined(__APPLE__) || defined(__HAIKU__)
400+
#if defined(__linux__) || defined(__FreeBSD__) || defined(__OpenBSD__) || defined(__NetBSD__) || defined(__DragonFly__) || \
401+
defined(__APPLE__) || defined(__HAIKU__)
372402
char fname[] = "pthread_create";
373403
#else
374404
char fname[] = "_pthread_create";

0 commit comments

Comments
 (0)