Skip to content

Commit e22f96c

Browse files
committed
docs: introduce a comprehensive README for snarkvm-ledger-store
Signed-off-by: ljedrz <[email protected]>
1 parent 678e813 commit e22f96c

File tree

1 file changed

+149
-0
lines changed

1 file changed

+149
-0
lines changed

ledger/store/README.md

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,152 @@
55
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](./LICENSE.md)
66

77
The `snarkvm-ledger-store` crate provides the data store for the ledger.
8+
9+
There are currently 2 implementations: a persistent one based on RocksDB, and an in-memory one. The
10+
in-memory one is the default, while the `"rocks"` feature utilizes RocksDB instead.
11+
12+
### General assumptions
13+
14+
The following is a list of assumptions related to the way the Aleo storage is typically used,
15+
which influenced the database choice and configuration, some of the APIs, and overall design
16+
decisions:
17+
- The storage needs to be usable both in a persistent and ephemeral (in-memory) way, the latter of
18+
which may not assume the existence of a filesystem (excluding a “persistent” storage residing in
19+
`/tmp`)
20+
- The high-level API needs to be consistent across the storage implementations
21+
- Many concurrent reads are expected at any point in time, with only few composite writes related
22+
to block insertions
23+
24+
### Storage properties
25+
26+
Due to RocksDB being the primary implementation of persistent storage used by snarkOS, some of its
27+
design specifics have impacted the overall design of the storage APIs and objects. These include:
28+
- The data is stored as key-value pairs
29+
- The entries are ordered lexicographically (automatic in RocksDB); this is sometimes taken
30+
advantage of when iterating over multiple records
31+
- Operations may be performed individually, or as part of atomic batches
32+
- In order for multiple entries to be inserted atomically, the high-level operations are organized
33+
into batches
34+
35+
### Main features shared between implementations
36+
37+
The primary means of accessing the storage are the `Map` and `NestedMap` traits, plus their
38+
`(Nested)MapRead` counterparts, which provide read-only functionalities.
39+
40+
The basic concept behind the `Map` is that it relates to key-value pairs, like in a hash map.
41+
The RocksDB-applicable object is the `(Nested)DataMap`, and the in-memory one is the
42+
`(Nested)MemoryMap`.
43+
44+
The nested maps work like a double map - keys and values inserted into storage not just
45+
individually, but also within the context of some grouping key `M`; removing it (via
46+
`NestedMap::remove_map`) removes all the grouped entries.
47+
48+
Each `*Map` object contains the following members which work basically the same:
49+
- `batch_in_progress` - an indicator of whether the map is currently involved in an atomic batch
50+
operation
51+
- `atomic_batch` - the contents of the current atomic batch operation (useful in case any of them
52+
need to be looked up during that operation; a `None` value indicates a deletion, while `Some` - an
53+
insertion
54+
- `checkpoints` - a list of indices demarcating potential meaningful subsets of the atomic batch
55+
operation, allowing the rolling back of a partial pending operation
56+
57+
The storage is divided into several logical units (e.g. `BlockStorage`) which may contain several
58+
`*Map` members.
59+
60+
### Main differences between implementations
61+
62+
All RocksDB-backed objects (`(Nested)DataMap`s) share a single underlying instance of RocksDB
63+
containing all the data, while the data in the in-memory storage is chunked across all the
64+
`(Nested)MemoryMap`s, each containing only its relevant entries.
65+
66+
The persistent storage, which needs stricter atomicity guarantees than the in-memory one, has a
67+
feature which allows the atomic writes to be paused (`pause_atomic_writes`). When called, it
68+
causes the storage write operations to not automatically result in physical writes, instead
69+
accumulating any further writes and extending any ongoing write batch with them. This ends upon a
70+
call to `unpause_atomic_writes`, which executes all the accumulated writes as a single atomic
71+
operation.
72+
73+
Every `(Nested)DataMap` is associated with a `DataID` enum, which constitutes a part of a binary
74+
prefix that gets prepended to the keys when they are written to the database. This allows us to
75+
have the same key used for different storage entries without resulting in duplicates (e.g. having
76+
a single block hash corresponding to both a height, and list of transaction IDs).
77+
78+
The `Network` identifier (`Network::ID`) paired with the `DataID` comprises the context member of
79+
each `(Nested)DataMap`, which is also the aforementioned binary prefix of RocksDB keys.
80+
81+
The `StorageMode` is of little interest to the in-memory storage, as its primary use is to decide
82+
where to store storage-related files (or where to load them from).
83+
84+
There is a `static DATABASES` which is used with RocksDB, but it is only meaningful in tests
85+
involving persistent storage - it ensures that all instances are completely unrelated during
86+
concurrent tests.
87+
88+
### Macros
89+
90+
`atomic_batch_scope`: this macro serves as a wrapper for multiple low-level storage operations,
91+
which causes them to be executed as a single atomic batch. Note that each consecutive time it is
92+
called, a new atomic checkpoint is created, but `start_atomic` is called only once. It is
93+
restricted to a single logical unit of storage (e.g. `BlockStorage`), which separates it from the
94+
later workaround that is `(un)pause_atomic_writes` (which was introduced specifically so that a
95+
single atomic operation could be performed both on `BlockStorage` and `FinalizeStorage`).
96+
97+
`atomic_finalize`: this was added in order to facilitate different modes of finalization,
98+
specifically to not perform any actual writes in the `DryRun` mode. Other than that, it behaves
99+
like `atomic_batch_scope`, with the exception that it may not be nested.
100+
101+
### Basic atomic batch happy path for RocksDB
102+
103+
**Phase 1 (setup)**:
104+
1. `start_atomic` is called, typically from `atomic_batch_scope` or a top-level operation on one
105+
of the logical storage units (`*Storage` trait implementors).
106+
2. `batch_in_progress` is set to true in all the maps involved.
107+
3. Each map triggering `start_atomic causes` the `atomic_depth` counter to be incremented; it is
108+
the most foolproof way to check whether there is any active atomic batch started in any logical
109+
storage unit, which is why `pause_atomic_writes` uses it internally.
110+
4. The contents of the per-map `atomic_batch` are checked - they must be empty at this stage
111+
(logic check).
112+
5. The contents of the database-wide `atomic_batch` are checked - they too must be empty, unless
113+
we’ve paused atomic writes (which will cause the per-map collections to be moved to the
114+
per-database one, but ultimately not remove the latter).
115+
116+
**Phase 2 (batching)**:
117+
1. A number of read and write operations are performed in the associated maps. Any writes are
118+
collected in the per-map `atomic_batch` collections.
119+
2. If any nested operations are performed via the `atomic_batch_scope` macro, `atomic_checkpoint`
120+
will be called instead of `start_atomic`, demarcating the end of a meaningful subset of the entire
121+
batch operation.
122+
3. After each nested operation is executed (queued for writing) successfully,
123+
`clear_latest_checkpoint` is called, as there is no more potential need to roll it back.
124+
125+
**Phase 3 (execution)**:
126+
1. `finish_atomic` is called either directly, or once `atomic_batch_scope` detects that it’s the
127+
end of its scope (i.e. all the lower-level operations have already been called `finish_atomic`
128+
internally, leaving only the final `atomic_depth` value of `1` coming from the macro).
129+
2. All the pending per-map operations are serialized, and moved to the database-wide (under the
130+
`RocksDB` object) `atomic_batch` collection.
131+
3. The (per-map) `checkpoints` are cleared, as they are no longer useful.
132+
4. The (per-map) `batch_in_progress` flag is set to `false`.
133+
5. The (database-wide) `atomic_depth` decreases by `1` for each map involved in the batch
134+
operation.
135+
6. The previous `atomic_depth` is checked - it may not be `0`, as it would indicate that a
136+
`start_atomic` call was not paired with one to `finish_atomic`.
137+
7. If `pause_atomic_writes` is not in force, and it’s the final (outermost) call to
138+
`finish_atomic`, the database-wide `atomic_batch` is cleared from entries, which are executed by
139+
RocksDB atomically.
140+
141+
### The sequential processing thread
142+
143+
While the atomicity of storage write operations is enforced by the storage itself, some of the
144+
operations (currently `VM::{atomic_speculate, add_next_block}`) that involve them may not be
145+
invoked concurrently. In order to avoid this, the `VM` object spawns a persistent background thread
146+
(introduced in [#2975](https://github.com/ProvableHQ/snarkVM/pull/2975)) dedicated to collecting
147+
(via an `mpsc` channel) and sequentially processing them.
148+
149+
### Storage modes
150+
151+
One of the fundamental parameters associated with the storage is the `StorageMode`, defined in
152+
[`aleo-std`](https://github.com/ProvableHQ/aleo-std). It is primarily used to determine where the
153+
persistent storage is stored on the disk (via `aleo_ledger_dir`).
154+
155+
The `StorageMode::Test`, dedicated to testing, should be created via `StorageMode::new_test`,
156+
unless only a single `DataMap` is used, in which case it can also be constructed manually.

0 commit comments

Comments
 (0)