O1Heap is a highly deterministic constant-complexity memory allocator designed for hard real-time high-integrity embedded systems.
The allocator offers a constant worst-case execution time (WCET) that is also extremely low, and a well-characterized and easily predictable worst-case memory fragmentation (consumption) (WCMC) bound. It allows the designer to statically prove its temporal and spatial properties for a given application, which makes it suitable for high-integrity embedded systems.
O1Heap is implemented in C99/C11 following MISRA C:2012; it is extremely compact (<500 LoC) and is trivial to understand and validate. It is designed to be usable across all conventional architectures out of the box, from 8-bit to 64-bit systems.
⚡ As a reference, on an RP2350 CM33, allocation takes ≤125 cycles and deallocation takes ≤115 cycles — always, irrespective of the size, preceding (de)allocation sequence, fragmentation, or memory usage — making it not only suitable for hard real-time but also one of the fastest allocators out there. Similar results have been observed on a Cortex M4 MCU.
Copy the files o1heap.c and o1heap.h (find them under o1heap/) into your project tree.
Either keep them in the same directory, or make sure that the directory that contains the header
is added to the set of include look-up paths.
No special compiler options are needed to compile the source file (if you find this to be untrue, please open a ticket).
Dedicate a memory arena for the heap, and pass a pointer to it along with its size to the initialization function
o1heapInit(..).
Allocate and deallocate memory using o1heapAllocate(..) and o1heapFree(..).
Their semantics are compatible with malloc(..) and free(..) plus additional behavioral guarantees
(constant timing, bounded fragmentation).
If necessary, periodically invoke o1heapDoInvariantsHold(..) to ensure that the heap is functioning correctly
and its internal data structures are not damaged.
Avoid concurrent access to the heap. Use locking if necessary.
#include "o1heap.h"
#include <stdalign.h>
static alignas(O1HEAP_ALIGNMENT) unsigned char heap_arena[32768];
int main(void)
{
O1HeapInstance* heap = o1heapInit(heap_arena, sizeof(heap_arena));
if (heap == NULL) {
return 1; // Initialization failed -- arena not aligned or too small
}
void* ptr = o1heapAllocate(heap, 200); // Always takes the same time to complete.
if (ptr != NULL) {
// Use the allocated memory...
o1heapFree(heap, ptr); // Also constant-time!
}
return 0;
}The preprocessor options given below can be overridden to fine-tune the implementation. None of them are mandatory to use.
Define this optional macro like O1HEAP_CONFIG_HEADER="path/to/my_o1heap_config.h" to pass build configuration macros.
This is useful because some build systems do not allow passing function-like macros via command line flags.
The macro O1HEAP_ASSERT(x) can be defined to customize the assertion handling or to disable it.
To disable assertion checks, the macro should expand to (void)(x).
If not specified, the macro expands to the standard assertion check macro assert(x) as defined in <assert.h>.
Some of the conditional branching statements are equipped with this annotation to hint the compiler that the generated code should be optimized for the case where the corresponding branch is (not) taken. This is done to reduce the worst-case execution time.
The macro should expand to a compiler-specific branch weighting intrinsic,
or to the original expression (x) if no such hinting is desired.
Defaults are provided for well-known compilers.
The count leading zeros (CLZ) function is used for fast binary logarithm computation (which has to be done multiple times per allocation, so its performance is critical). Most of the modern processors implement dedicated hardware support for fast CLZ computation, which is available via compiler intrinsics.
If not overridden by the user, for some compilers O1HEAP_CLZ(x) will expand to the appropriate intrinsic
(e.g., __builtin_clzl(x) for GCC/Clang).
For other compilers it will default to a slow software implementation,
which is likely to significantly degrade the performance of the library.
-
Memory allocation and deallocation routines are constant-time and fast.
-
For a given peak memory requirement
$M$ , the worst-case memory consumption$H$ is predictable (i.e., the worst-case heap fragmentation is well-characterized). The application designer shall be able to easily predict the amount of memory that needs to be provided to the memory allocator to ensure that out-of-memory (OOM) failures provably cannot occur at runtime under any circumstances. -
The implementation shall be simple and conform to relevant high-integrity coding guidelines.
This implementation derives from or is based on the ideas presented in:
- "Timing-Predictable Memory Allocation In Hard Real-Time Systems" -- J. Herter, 2014.
- "An algorithm with constant execution time for dynamic storage allocation" -- T. Ogasawara, 1995.
- "Worst case fragmentation of first fit and best fit storage allocation strategies" -- J. M. Robson, 1975.
This implementation does not make any non-trivial assumptions about the behavioral properties of the application. While it has been shown to be possible to construct a constant-complexity allocator that demonstrates better worst- and average-case memory requirements by making assumptions about the memory (de-)allocation patterns, this implementation makes no assumptions about the application.
The library implements a modified half-fit algorithm -- a constant-complexity strategy originally proposed by Ogasawara.
In this implementation, memory is allocated in fragments whose size is rounded up to the next integer power of two.
The worst-case memory consumption (WCMC)
Where
Memory allocators used in general-purpose (non-real-time and/or non-high-integrity) applications often leverage a different class of algorithms which may feature poorer worst-case performance metrics but perform (much) better on average. For a hard real-time system, the average case performance is generally less relevant, so it can be excluded from analysis.
The two-level segregated fit (TLSF) algorithm is a more complex
| Allocation strategy | WCMC |
|---|---|
| First-fit | |
| Half-fit | |
| Best-fit | |
| TLSF | (see best-fit) |
The state of catastrophic fragmentation is a state where the allocator is unable to serve
a memory allocation request even if there is enough free memory due to its suboptimal arrangement.
By definition, if the amount of memory available to the allocator is not less than
The above-defined theoretical worst-case upper bound H may be prohibitively high for some
memory-constrained applications.
It has been shown [Robson 1975] that under a typical workload,
for a sufficiently high amount of memory available to the allocator which is less than
Following some of the ideas from [Herter 2014], this implementation takes caching-related issues into consideration by choosing the most recently used memory fragments to minimize cache misses in the application.
The implemented variant of half-fit allocates memory in fragments of size:
Where O1HEAP_ALIGNMENT,
because it also dictates the allocated memory pointer alignment.
Due to the overhead, the maximum amount of memory available to the application per allocation is
From the above follows that malloc(..),
the allocator returns a null pointer if a zero-sized allocation is requested.
It has been mentioned that the abstract definition of
Where
The above equation should be used for sizing the heap space.
Observe that the case of
The following illustration shows the worst-case memory consumption (WCMC) for some common memory sizes;
as explained above, O1HEAP_ALIGNMENT which is platform-dependent:
Please refer to the continuous integration configuration to see how to build and test the library.
The code must be clang-formatted.
To release a new version, update the version number macro in the header file and create a new semver git tag like
2.3.0. Publish a new release on GitHub as well.
MISRA compliance is enforced with the help of Clang-Tidy. Every intentional deviation shall be documented and justified in-place using the following notation, followed by the appropriate static analyser warning suppression statement:
// Intentional violation of MISRA: <valid reason here>
// NOLINT(*-specific-rule)- Timing-Predictable Memory Allocation In Hard Real-Time Systems, J. Herter, 2014.
- Worst case fragmentation of first fit and best fit storage allocation strategies, J. M. Robson, 1975.
- Dynamic Memory Allocation In SQLite -- on Robson proof and deterministic fragmentation.
- Динамическая память в системах жёсткого реального времени -- issues with dynamic memory allocation in modern embedded RTOS and related popular misconceptions.
Version 3.0 reduces the per-fragment overhead from 4×(pointer width) to 2×(pointer width) by packing the fragment header.
On 32-bit platforms, O1HEAP_ALIGNMENT and the per-fragment overhead are now 8 bytes instead of 16.
This change saves memory but has some performance cost; e.g., on RP2350 (Cortex M33) allocation is ≈19% slower
(allocation mean 119 cycles vs. 100 cycles, deallocation mean 79 vs. 74 cycles;
see the on-target benchmark in perftest/).
The trade-off is believed to be justifiable for virtually all applications.
o1heapReallocate is added, which is constant-complexity except for the case when the old fragment cannot be
expanded in-place. See the API docs for details.
The trace events introduced in v2.2 have been removed due to unclean integration and relative lack of use. They may reappear in a future version, perhaps designed differently; please open an issue to discuss.
- Add optional trace events, enabled via the
O1HEAP_TRACEbuild-time option. - Add
o1heapMinArenaSize. - Add
o1heapGetMaxAllocationSize.
- Significantly accelerate (de-)allocation by replacing the naïve log2 implementation with fast CLZ intrinsics;
see
O1HEAP_CLZ(x). - Do not require char to be 8-bit wide: replace
uint8_twithuint_fast8_t. This is to enhance compatibility with odd embedded platforms whereCHAR_BIT!=8(e.g., ADSP TS-201, TMS320C2804).
- Remove critical section hooks to enhance MISRA conformance #4
- Add support for config header via
O1HEAP_CONFIG_HEADER#5
The first release.
The library is available under the terms of the MIT License.
Please find it in the file LICENSE.
