TSM is a way to provide safe memory access for lockfree data structures.
Every heap address H “has a type” because a type ID can be computed from
H. linalloc() allocates memory of a given type. linref_up(H, T) ensures
that H has type T and isn’t reallocated with a different type.
The point is that, unlike schemes which block reuse of referenced memory,
TSM allows H to be freed and reused by an allocation of type T.
For example, consider a doubly-linked list used in a lockfree kernel:
- Suppose
lflist_del(A)has readP := A->prevand decided to writeP->nextwith a CAS. IfPis freed and reallocated as a string, the CAS can spuriously succeed if*Pis an unlucky string which resembles the expected list node.linref_up(P, "list node")prevents this.
linref_up()’s guarantee is weak, but the usual stronger guarantee turns
out to be irrelevant in a robust program.
lflist_enq()can’t make allocations because its callers can’t really afford to handle allocation failures. So the list operates on “anchors” preallocated inside enqueuable objects. Successive enqueues of an object must reuse its anchor.- Lockfree progress would be impractical, at best, if
lflist_enq(A)weren't free to modifyAbecause some out of date lflist function were operating onA. - So every lflist function must be prepared to deal with any anchor
being deleted and re-enqueued just before a
CAS. Generally, they must at least make everyCAScontingent on a version count that’s been live for some critical interval. linref_up()proves that such a count exists, and it wouldn’t make the lflist simpler or more efficient if you knew thatAalso wasn’t freed for this interval.
As I describe, TSM has a simple and cheap refcounting implementation. However, it should be compatible with other schemes for keeping track of references.
Nalloc is a TSM-enabled slab allocator. It allocates “lineages” from “heritages” in O(1) uncontended.
TSM is a minor addition to a slab allocator. Nalloc is just a basic (lockfree) slab allocator plus a refcount and type object pointer per slab.
A lineage is a block of memory. Lineage L has a type in the sense that a
pointer to a type object Y can be computed from L’s address.
-
Ystores the size of lineages of its type; and a lineage initialization functionlin_init(), to be run during allocation. -
Roughly speaking, if
linref_up(L, Y)succeeds, the following holds untillinref_down(L, Y):- L was last allocated with
linalloc("Y"), and thusY->lin_init(L)completed. linref_up(L, Y')will succeed iffY' == Y- in any thread.- If
Lis freed and reallocated:- It must be reallocated with type
Y. - nalloc won’t write to any part of
Lother than its lowest word. This means thatY->lin_init(L)won’t run again.
- It must be reallocated with type
- L was last allocated with
-
In other words, all threads will agree that
Lhas typeY, and has been initialized according toY. If threads cooperate to make sureLsatisfies some invariant ofY, nalloc won’t ruin it by clobberingL. -
See nalloc.h for a more precise description.
-
It’s called a lineage because it represents multiple generations of a C memory object.
A slab is an array of lineages and some metadata in a footer. All lineages in a given slab have the same type and thus size. All slabs have the same size and are naturally aligned, so that the slab of a lineage can be computed directly from its address.
- Each slab stores a refcount and a pointer to struct type, and
L’s type is the type of its containing slab.linref_up(L,_)is just aCAS2onslab_of(L)’s type fields.
A heritage is a pool of slabs having the same type.
- NB: they can be thread/CPU-local but wk doesn’t exploit this.
nalloc defines some internal heritages with “types” meant to represent
plain char buffers of different size classes. malloc() is just a wrapper
around linalloc() with one these heritages.
Lockfree slab allocation is relatively simple. nalloc elaborates on the basic idea of using lfstacks to represent free lineages in a slab and free slabs in a heritage.
linalloc(H) pops a slab S from heritage H, fetches a lineage from S, and
returns S to H iff S has lineages remaining. If not, S is said to be
“lost”.
- NB: empty slabs aren’t returned so that search is
O(1). The main challenge of nalloc is to make this work without leaking memory.
S splits its free lineages into 3 sets. This is kind of a relic of old
designs, but it still has benefit.
- A “main” non-lockfree stack
S->local_blocks. - An lfstack
S->hot_blocks, on a separate cache line.- NB: it accumulates freed lineages and is collected in batches. It lets you avoid locked instructions and shared cache line writes.
- The maximal contiguous set of free lineages beginning at the start
of the slab is represented only as a count
S->contig_blocks.- NB: it makes slab initializion cheaper. It encourages cache- and prefetcher-friendly access patterns, not just inside nalloc.
Slabs are initialized with all lineages in contig_blocks.
alloc_from_slab(S) in linalloc() attempts to allocate from
contig_blocks, local_blocks, and hot_blocks, in that order.
- If it empties local_blocks, it uses a variant of
lstack_clear()to transfer all lineages fromhot_blocksontolocal_blocksinO(1).
linfree(L) pushes L to slab_of(L)->hot_blocks, freeing the slab when
appropriate.
The main subtlety is that linfree(L) needs to know whether S := slab_of(L)
has been lost, in which case it must return S to its former heritage H or
arrange for it to be freed.
- Multiple
linfree(L)calls can findSlost, but only one should return it because my lockfree stack deliberately doesn’t support concurrent pushes of the same “node”. More subtly, it’s not simple forlinfree()to agree with an in-flightlinalloc()on whetherSis actually lost. - The key observation is that
S->hot_blocksis always cleared entirely with anlfstack_clear(). Unlike a stack subjected to the well-known lockfreepop()algorithm,S->hot_blocksdoesn’t need to store a version count. - So
S->hot_blocks.geninstead stores an authoritative “lost” flag and the stack’s size. The flag is set whenlinalloc()findsS->hot_blocksempty after also exhausting the other sets. It’s unset by the nextlinfree()toS. - If this
linfree()decides to freeS, it suffices to not return it toH. No future allocations fromSwill happen, and the finallinfree()toSwill know to free it.
The biggest problem with nalloc is probably reporting failure in
linref_up(P) when P isn’t on the nalloc heap. If it fails to detect this
case, then it may spuriously write to P’s non-existant slab metadata.
- Luckily, wk can easily validate
Pbecause its heap is contiguous and non-shrinking. - The userspace heap isn’t, since it uses non-fixed mappings and should return memory to the system. Further, the user may have made mappings outside of nalloc’s control.
- Unfortunately, the current solution for userspace nalloc is to just not return virtual memory to the kernel. It can remap free slabs as “guard” pages to return physical memory, but the current implementation doesn’t.
- This way, if
Pwas previously on the heap, thenlinref_up(P)can expect a segfault becausePcouldn’t have been remapped outside of nalloc - barring unsafe fixed-mapping tricks outside of nalloc. - If
Pwas never on the heap, thenlinref_up(P, Y)can still spuriously succeed ifPis on a non-nalloc-mapped page that resembles a slab of typeY. I don’t expect a program to generate such aPunless it takes pointers from untrusted sources. In that case, it’s not secure.