Merge branch 'master' of github.com:rampantpixels/rpmalloc

mjansson · mjansson · commit 1fa867356fd1 · 2017-03-15T12:00:17.000+01:00
diff --git a/README.md b/README.md
@@ -20,19 +20,15 @@ Please consider our Patreon to support our work - https://www.patreon.com/rampan
 Created by Mattias Jansson / Rampant Pixels  -  http://www.rampantpixels.com
 
 # Performance
-We believe rpmalloc is faster than most popular memory allocators like tcmalloc, hoard, ptmalloc3 and others. We also believe the implementation to be easier to read and modify compared to these allocators, as it is a single source file of ~1300 lines of C code.
+We believe rpmalloc is faster than most popular memory allocators like tcmalloc, hoard, ptmalloc3 and others without causing extra allocated memory overhead in the thread caches. We also believe the implementation to be easier to read and modify compared to these allocators, as it is a single source file of ~1300 lines of C code.
 
 Contained in the repository is a benchmark utility that performs interleaved allocations (both aligned to 8 or 16 bytes, and unaligned) and deallocations (both in-thread and cross-thread) in multiple threads. It measures number of memory operations performed per CPU second, as well as memory overhead by comparing the virtual memory mapped with the number of bytes requested in allocation calls. The setup of number of thread, cross-thread deallocation rate and allocation size limits is configured by command line arguments.
 
-Below is comparison charts of performance of rpmalloc and other popular allocator implementations.
+Below is an example performance comparison chart of rpmalloc and other popular allocator implementations.
 
-![Windows random [16, 1000] bytes, 8 cores](https://docs.google.com/spreadsheets/d/1NWNuar1z0uPCB5iVS_Cs6hSo2xPkTmZf0KsgWS_Fb_4/pubchart?oid=137567195&format=image)
+![Windows random [16, 8000] bytes, 8 cores](https://docs.google.com/spreadsheets/d/1NWNuar1z0uPCB5iVS_Cs6hSo2xPkTmZf0KsgWS_Fb_4/pubchart?oid=881719411&format=image)
 
-![Windows random [16, 8000] bytes, 8 cores](https://docs.google.com/spreadsheets/d/1NWNuar1z0uPCB5iVS_Cs6hSo2xPkTmZf0KsgWS_Fb_4/pubchart?oid=1511494420&format=image)
-
-![Windows random [16, 16000] bytes, 8 cores](https://docs.google.com/spreadsheets/d/1NWNuar1z0uPCB5iVS_Cs6hSo2xPkTmZf0KsgWS_Fb_4/pubchart?oid=1778024863&format=image)
-
-The benchmarks producing these numbers were run on a Windows 10 machine with 8 logical cores (4 physical, HT). The actual numbers are not to be interpreted as absolute performance figures, but rather as relative comparisons between the different allocators.
+The benchmark producing these numbers were run on a Windows 10 machine with 8 logical cores (4 physical, HT). The actual numbers are not to be interpreted as absolute performance figures, but rather as relative comparisons between the different allocators. More benchmarks will be published soon!
 
 # Implementation details
 The allocator is based on 64k alignment, where all runs of memory pages are mapped to 64KiB boundaries. On Windows this is automatically guaranteed by the VirtualAlloc granularity, and on mmap systems it is achieved by atomically incrementing the address where pages are mapped to. By aligning to 64KiB boundaries the free operation can locate the header of the memory block without having to do a table lookup (as tcmalloc does) by simply masking out the low 16 bits of the address.
diff --git a/build/ninja/clang.py b/build/ninja/clang.py
@@ -103,7 +103,7 @@ def initialize(self, project, archs, configs, includepaths, dependlibs, libpaths
     self.build_toolchain()
 
     self.cflags += ['-std=c11']
-    self.cxxflags += ['-std=c++11', '-stdlib=libc++']
+    #self.cxxflags += ['-std=c++11', '-stdlib=libc++']
 
     self.cexternflags = []
     self.cxxexternflags = []