Skip to content

[WIP] Mustardwatching epoch- and pointer-based reclamation #221

Open
@jeehoonkang

Description

@jeehoonkang

Here is my note on my experiment on mixing epoch-based reclamation (EBR) and pointer-based reclamation (e.g. hazard pointers, HP). My code is here: https://github.com/jeehoonkang/crossbeam/tree/snowflake/crossbeam-epoch Currently it's neither tested nor documented, unfortunately... Contributions or any form---code, documentation, comments, feedbacks---are very welcome!

Motivation

For safe memory reclamation (SMR) in concurrent data structures, a thread advertises (for experts: synchronizes reads-after-writes) that it is accessing some objects ("hazards") so that the other thread should not deallocate them. In the design of SMR schemes, the granularity of hazards is one of the most important design choices the creator should make. Roughly speaking, there are two representative choices: epoch-based reclamation (EBR) and pointer-based reclamation (e.g. hazard pointers, HP).

EBR is coarse-grained in that a thread advertises the epoch (think: timestamp) in which it is accessing the shared memory. The idea is that the garbage that is thrown in old epochs are no longer accessible from any thread and is safe to deallocate. An epoch can be incremented only if all the threads agreed to release all the pointers to the shared memory acquired in the previous epoch. EBR is usually fast because a thread needs to advertise its epoch only. On the other hand, it may not collect garbages in a timely fashion because of the coarse granularity: a thread may hold an epoch and disrupt garbage collection indefinitely. Specifically, EBR doesn't work well if (1) there exist long-lived pointers to the shared memory (e.g. map/set or cache); or (2) there are a lot of threads so that each thread cannot easily make progress. Previously long-lived pointers are handled with reference counting, which is sub-optimal because it writes to the memory even in the read path. On the other hand, the second case usually happens when the number of threads exceeds that of CPU cores, so its problem can be mitigated by using thread pools.

On the other hand, HP is fine-grained in that a thread advertises the pointers to the hazardous objects (so the name "hazard pointers"), and it can collect garbages quite aggressively thanks to the fine granularity. However, the problem is that it's often slow because a thread needs to advertise its hazard often and large.

Now time for mustardwatching! We want to take the advtange of mixing both approaches: using EBR when we can properly increment epochs, and using HP otherwise. I implemented a hybrid of EBR and HP on top of crossbeam-epoch. By doing so, we can efficiently support long-lived pointers while retaining the benefits of EBR, by simply turning the pointers into hazard pointers. The corresponding API is Guard::defend().

Performance

It's performance is comparable with the master branch in the absence of hazard pointers. Here's a comparison of the results of cargo +nightly bench:

name                     control ns/iter  variable ns/iter  diff ns/iter   diff %  speedup
 multi_alloc_defer_free   2,060,746        2,177,730              116,984    5.68%   x 0.95                                                                                                                 
 multi_defer              1,329,516        1,451,704              122,188    9.19%   x 0.92                                                                                                                 
 multi_flush              12,558,094       28,366,437          15,808,343  125.88%   x 0.44                                                                                                                 
 multi_pin                4,283,460        4,174,085             -109,375   -2.55%   x 1.03                                                                                                                 
 single_alloc_defer_free  34               36                           2    5.88%   x 0.94                                                                                                                 
 single_defer             17               24                           7   41.18%   x 0.71
 single_flush             110              608                        498  452.73%   x 0.18
 single_pin               7                8                            1   14.29%   x 0.88

I fully expected that flush() becomes slower: now we're checking hazard pointers in addition to epochs, which this benchmark . The performance of pin() is similar. The performance of defer() drops a little bit, but from maual inspection of generated assemblies I think it's unavoidable.

Related Work: Snowflake

I took a lot of inspiration from Microsoft Research's Snowflake (so the branch name), but my implementation differs from Snowflake in that:

  • My implementation properly supports EBR and protects short-lived pointers much more efficiently, while Snowflake uses only the idea of EBR to optimize HP and it's not exposed to the users.
  • My implementation doesn't support ejection of ill-behaved threads from the protocol, which guarantees robust garbage collection (i.e. a ill-behaved thread cannot block the deallocation of arbitrarily many resources).

Roughly speaking, my implementation is EBR + HP, while Snowflake is HP (boosted with EBR idea) + ejection mechanism. I've tried to design EBR + HP + ejection mechanism, but I believe EBR and ejection doesn't come along well.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions