-
Notifications
You must be signed in to change notification settings - Fork 79
Description
As an experiment I spent some time yesterday replacing RocksDB with LMDB (https://www.symas.com/mdb) in Glean's backend. The thinking is that LMDB is vastly simpler than RocksDB - just one .c file that we could bundle with Glean. And since it works by mmapping the DB into memory it might be possible to hack the storage scheme to avoid duplicating keys.
Anyway, I got it working as far as passing a bunch of tests and was able to test a large indexing run. The experiment is on my branch here: simonmar@a392ce3
I'm creating this issue to record the findings in case we want to pick it up later.
Findings so far:
-
I tried indexing Stackage with the Haskell indexer, and writing seems to be about 3x slower than RocksDB going by the throughput metrics produced by the server during indexing. That's worse than I hoped. I didn't try any tuning but 3x will be hard to overcome, so for now I'll leave the experiment here. (EDIT: disregard this, I was compiling without optimisation)
-
Key sizes are limited. Default is 511 bytes, but I managed to bump the max key size to 32KB by forcing a couple of settings in LMDB (which possibly had an effect on performance), but even 32KB wasn't enough to finish the indexing run of Stackage without a key size error. To make this viable we would have to add support for larger keys, probably by using LMDB's duplicate key support: basically move the overflow part of the key into the value, and linear-search entries with identical keys during key lookup. It probably wouldn't be too bad - I bet we almost never have multiple keys where the first 32KB are identical. Even using the LMDB default of 2KB might be enough.
-
There's no way to control memory usage, all the DBs are mmaped into the address space and you have to rely on the OS's paging algorithms to do a good job. Not clear at all whether this would work well at scale.