Skip to content

mssmt: add support for batch insertion and retrieval operations #451

Open
@Roasbeef

Description

@Roasbeef

Background

Once an SMT starts to fill up a bit, you start to have a lot of keys on disk due to all the internal branches. One optimization we did in the past was to "compress" the set of empty branches from a non-zero branch to leaf by omitting writing them to disk all together. This helps in terms of the storage space and also query speed, but doesn't allow us to insert a set of leaves more quickly.

Each time we insert a new item, we need to "bubble up" the tree and update all the branch digests from that leaf to the root of the tree. In many cases, when we're inserting a batch of items, we'll end up re-computing the same internal node hash several times, and also redundantly writing the same internal nodes to disk. An ideal algorithm would take all the items to be inserted, then walk up the tree in series, updating each internal node as needed, finally only writing the root tree one time.

We should explore the addition of a set of batched APIs to the current interface to allow us to save on on-disk churn, and also speed up batch insertion operations.

Some relevant reading material follows:

Metadata

Metadata

Assignees

Type

No type

Projects

Status

🔖 Ready

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions