|
5 | 5 | [](./LICENSE.md) |
6 | 6 |
|
7 | 7 | The `snarkvm-ledger-store` crate provides the data store for the ledger. |
| 8 | + |
| 9 | +There are currently 2 implementations: a persistent one based on RocksDB, and an in-memory one. The |
| 10 | +in-memory one is the default, while the `"rocks"` feature utilizes RocksDB instead. |
| 11 | + |
| 12 | +### General assumptions |
| 13 | + |
| 14 | +The following is a list of assumptions related to the way the Aleo storage is typically used, |
| 15 | +which influenced the database choice and configuration, some of the APIs, and overall design |
| 16 | +decisions: |
| 17 | +- The storage needs to be usable both in a persistent and ephemeral (in-memory) way, the latter of |
| 18 | +which may not assume the existence of a filesystem (excluding a “persistent” storage residing in |
| 19 | +`/tmp`) |
| 20 | +- The high-level API needs to be consistent across the storage implementations |
| 21 | +- Many concurrent reads are expected at any point in time, with only few composite writes related |
| 22 | +to block insertions |
| 23 | + |
| 24 | +### Storage properties |
| 25 | + |
| 26 | +Due to RocksDB being the primary implementation of persistent storage used by snarkOS, some of its |
| 27 | +design specifics have impacted the overall design of the storage APIs and objects. These include: |
| 28 | +- The data is stored as key-value pairs |
| 29 | +- The entries are ordered lexicographically (automatic in RocksDB); this is sometimes taken |
| 30 | +advantage of when iterating over multiple records |
| 31 | +- Operations may be performed individually, or as part of atomic batches |
| 32 | +- In order for multiple entries to be inserted atomically, the high-level operations are organized |
| 33 | +into batches |
| 34 | + |
| 35 | +### Main features shared between implementations |
| 36 | + |
| 37 | +The primary means of accessing the storage are the `Map` and `NestedMap` traits, plus their |
| 38 | +`(Nested)MapRead` counterparts, which provide read-only functionalities. |
| 39 | + |
| 40 | +The basic concept behind the `Map` is that it relates to key-value pairs, like in a hash map. |
| 41 | +The RocksDB-applicable object is the `(Nested)DataMap`, and the in-memory one is the |
| 42 | +`(Nested)MemoryMap`. |
| 43 | + |
| 44 | +The nested maps work like a double map - keys and values inserted into storage not just |
| 45 | +individually, but also within the context of some grouping key `M`; removing it (via |
| 46 | +`NestedMap::remove_map`) removes all the grouped entries. |
| 47 | + |
| 48 | +Each `*Map` object contains the following members which work basically the same: |
| 49 | +- `batch_in_progress` - an indicator of whether the map is currently involved in an atomic batch |
| 50 | +operation |
| 51 | +- `atomic_batch` - the contents of the current atomic batch operation (useful in case any of them |
| 52 | +need to be looked up during that operation; a `None` value indicates a deletion, while `Some` - an |
| 53 | +insertion |
| 54 | +- `checkpoints` - a list of indices demarcating potential meaningful subsets of the atomic batch |
| 55 | +operation, allowing the rolling back of a partial pending operation |
| 56 | + |
| 57 | +The storage is divided into several logical units (e.g. `BlockStorage`) which may contain several |
| 58 | +`*Map` members. |
| 59 | + |
| 60 | +### Main differences between implementations |
| 61 | + |
| 62 | +All RocksDB-backed objects (`(Nested)DataMap`s) share a single underlying instance of RocksDB |
| 63 | +containing all the data, while the data in the in-memory storage is chunked across all the |
| 64 | +`(Nested)MemoryMap`s, each containing only its relevant entries. |
| 65 | + |
| 66 | +The persistent storage, which needs stricter atomicity guarantees than the in-memory one, has a |
| 67 | +feature which allows the atomic writes to be paused (`pause_atomic_writes`). When called, it |
| 68 | +causes the storage write operations to not automatically result in physical writes, instead |
| 69 | +accumulating any further writes and extending any ongoing write batch with them. This ends upon a |
| 70 | +call to `unpause_atomic_writes`, which executes all the accumulated writes as a single atomic |
| 71 | +operation. |
| 72 | + |
| 73 | +Every `(Nested)DataMap` is associated with a `DataID` enum, which constitutes a part of a binary |
| 74 | +prefix that gets prepended to the keys when they are written to the database. This allows us to |
| 75 | +have the same key used for different storage entries without resulting in duplicates (e.g. having |
| 76 | +a single block hash corresponding to both a height, and list of transaction IDs). |
| 77 | + |
| 78 | +The `Network` identifier (`Network::ID`) paired with the `DataID` comprises the context member of |
| 79 | +each `(Nested)DataMap`, which is also the aforementioned binary prefix of RocksDB keys. |
| 80 | + |
| 81 | +The `StorageMode` is of little interest to the in-memory storage, as its primary use is to decide |
| 82 | +where to store storage-related files (or where to load them from). |
| 83 | + |
| 84 | +There is a `static DATABASES` which is used with RocksDB, but it is only meaningful in tests |
| 85 | +involving persistent storage - it ensures that all instances are completely unrelated during |
| 86 | +concurrent tests. |
| 87 | + |
| 88 | +### Macros |
| 89 | + |
| 90 | +`atomic_batch_scope`: this macro serves as a wrapper for multiple low-level storage operations, |
| 91 | +which causes them to be executed as a single atomic batch. Note that each consecutive time it is |
| 92 | +called, a new atomic checkpoint is created, but `start_atomic` is called only once. It is |
| 93 | +restricted to a single logical unit of storage (e.g. `BlockStorage`), which separates it from the |
| 94 | +later workaround that is `(un)pause_atomic_writes` (which was introduced specifically so that a |
| 95 | +single atomic operation could be performed both on `BlockStorage` and `FinalizeStorage`). |
| 96 | + |
| 97 | +`atomic_finalize`: this was added in order to facilitate different modes of finalization, |
| 98 | +specifically to not perform any actual writes in the `DryRun` mode. Other than that, it behaves |
| 99 | +like `atomic_batch_scope`, with the exception that it may not be nested. |
| 100 | + |
| 101 | +### Basic atomic batch happy path for RocksDB |
| 102 | + |
| 103 | +**Phase 1 (setup)**: |
| 104 | +1. `start_atomic` is called, typically from `atomic_batch_scope` or a top-level operation on one |
| 105 | +of the logical storage units (`*Storage` trait implementors). |
| 106 | +2. `batch_in_progress` is set to true in all the maps involved. |
| 107 | +3. Each map triggering `start_atomic causes` the `atomic_depth` counter to be incremented; it is |
| 108 | +the most foolproof way to check whether there is any active atomic batch started in any logical |
| 109 | +storage unit, which is why `pause_atomic_writes` uses it internally. |
| 110 | +4. The contents of the per-map `atomic_batch` are checked - they must be empty at this stage |
| 111 | +(logic check). |
| 112 | +5. The contents of the database-wide `atomic_batch` are checked - they too must be empty, unless |
| 113 | +we’ve paused atomic writes (which will cause the per-map collections to be moved to the |
| 114 | +per-database one, but ultimately not remove the latter). |
| 115 | + |
| 116 | +**Phase 2 (batching)**: |
| 117 | +1. A number of read and write operations are performed in the associated maps. Any writes are |
| 118 | +collected in the per-map `atomic_batch` collections. |
| 119 | +2. If any nested operations are performed via the `atomic_batch_scope` macro, `atomic_checkpoint` |
| 120 | +will be called instead of `start_atomic`, demarcating the end of a meaningful subset of the entire |
| 121 | +batch operation. |
| 122 | +3. After each nested operation is executed (queued for writing) successfully, |
| 123 | +`clear_latest_checkpoint` is called, as there is no more potential need to roll it back. |
| 124 | + |
| 125 | +**Phase 3 (execution)**: |
| 126 | +1. `finish_atomic` is called either directly, or once `atomic_batch_scope` detects that it’s the |
| 127 | +end of its scope (i.e. all the lower-level operations have already been called `finish_atomic` |
| 128 | +internally, leaving only the final `atomic_depth` value of `1` coming from the macro). |
| 129 | +2. All the pending per-map operations are serialized, and moved to the database-wide (under the |
| 130 | +`RocksDB` object) `atomic_batch` collection. |
| 131 | +3. The (per-map) `checkpoints` are cleared, as they are no longer useful. |
| 132 | +4. The (per-map) `batch_in_progress` flag is set to `false`. |
| 133 | +5. The (database-wide) `atomic_depth` decreases by `1` for each map involved in the batch |
| 134 | +operation. |
| 135 | +6. The previous `atomic_depth` is checked - it may not be `0`, as it would indicate that a |
| 136 | +`start_atomic` call was not paired with one to `finish_atomic`. |
| 137 | +7. If `pause_atomic_writes` is not in force, and it’s the final (outermost) call to |
| 138 | +`finish_atomic`, the database-wide `atomic_batch` is cleared from entries, which are executed by |
| 139 | +RocksDB atomically. |
| 140 | + |
| 141 | +### The sequential processing thread |
| 142 | + |
| 143 | +While the atomicity of storage write operations is enforced by the storage itself, some of the |
| 144 | +operations (currently `VM::{atomic_speculate, add_next_block}`) that involve them may not be |
| 145 | +invoked concurrently. In order to avoid this, the `VM` object spawns a persistent background thread |
| 146 | +(introduced in [#2975](https://github.com/ProvableHQ/snarkVM/pull/2975)) dedicated to collecting |
| 147 | +(via an `mpsc` channel) and sequentially processing them. |
| 148 | + |
| 149 | +### Storage modes |
| 150 | + |
| 151 | +One of the fundamental parameters associated with the storage is the `StorageMode`, defined in |
| 152 | +[`aleo-std`](https://github.com/ProvableHQ/aleo-std). It is primarily used to determine where the |
| 153 | +persistent storage is stored on the disk (via `aleo_ledger_dir`). |
| 154 | + |
| 155 | +The `StorageMode::Test`, dedicated to testing, should be created via `StorageMode::new_test`, |
| 156 | +unless only a single `DataMap` is used, in which case it can also be constructed manually. |
0 commit comments