Skip to content
This repository was archived by the owner on Jul 1, 2021. It is now read-only.
This repository was archived by the owner on Jul 1, 2021. It is now read-only.

Adopt the Turbo-Geth database layout #779

Closed
@lithp

Description

@lithp

What is wrong?

In order for Trinity to benefit from the improved sync speed Firehose makes possible it needs to change the way it lays out data on-disk. Specifically, account and storage data is currently stored as sha(object) -> object, which means that syncing requires a lot of random reads and writes.

How can it be fixed

I'm not sure what the best approach is, I'll add some comments below exploring the options.

To limit scope I'm going to try to stick to using leveldb, I think that switching it out for a different database might lead to further performance improvements but would drastically increase the amount of work required.

Some requirements & opportunities

  • The new layout needs to be able to handle re-orgs, meaning it needs to be able to rewind blocks. This requires that some kind of history is kept.

  • The new layout needs to be able to quickly generate chunks of leaves and serve them to clients, so other clients will be able to quickly sync.

  • In order to be a good citizen on the network the new layout needs to be able to quickly serve eth/63 and les/2. In particular, requests like GetNodeData require keeping around a materialized trie or maybe more. Alternatively, we might want to suggest a eth/64 which does nothing but replace GetNodeData with something easier to serve.

  • The new layout also needs to support quick responses to all the JSON-RPCs.

  • Currently Trinity runs as an archive node and doesn't perform any pruning. In fact, sha(object) -> object makes pruning very difficult because it naturally de-dupes, and the only way to know that you've removed the last reference to an object is to do a very expensive garbage collection pass of the entire database. Ideally the new layout would make pruning easier so that trinity won't have to save everything.

  • Trie nodes in random order are about the least compressible disk pages possible. Storing all the accounts next to other is likely to improve the compression ratio of the database (leveldb compresses on a per-page basis)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions