Skip to content

Static File Backend for Lighthouse Cold DB #8634

@dapplion

Description

@dapplion

Potential idea to migrate Lighthouse's cold DB (freezer) from LevelDB to Geth-style static files to eliminate compaction bottlenecks in archive nodes. Based on Geth's freezer design with O(1) read/write performance.

Potential idea to speed up archive node sync and reduce their I/O. However, we would need to rethink Lighthouse strategy around reconstruction and how archive nodes sync.

CC @michaelsproul

Critical Architectural Questions

1. Dual Write Pointers: Forward Finalization + Reconstruction

On Checkpoint sync, the node starts from a finalized checkpoint and has two concurrent processes:

  • Finalization pointer: Writes forward from checkpoint slot (newest finalized data)
  • Reconstruction pointer: Backfills historical data backward toward genesis

Write pointers are in disjoint slot ranges (always < and >= checkpoint slot). They meet in the middle when backfill completes

Question: How should static files handle sparse writes across disjoint ranges?

Options:

Option A: Abolish background reconstruction

To populate an archival node, force users to sync from genesis to some trusted checkpoint. The node won't be able to participate in duties until it has fully synced, but that is the normal UX for execution nodes anyway. We can download headers first like in tree-sync against a trusted checkpoint to make genesis sync safe.

If users checkpoint sync they can only populate archive data from the anchor slot.

This way we have a single pointer moving forward writing archival data at any time.

Option B: Single continuous index file with sparse slots

Index file: slot 0 ‚Üí slot N (with gaps)
- Finalization writes at slots: checkpoint_slot, checkpoint_slot+1, ...
- Reconstruction writes at slots: checkpoint_slot-1, checkpoint_slot-2, ..., 0
- Index entries for missing slots: special marker (e.g., file_number = 0xFFFFFFFF)

Pros: Simple model, single index file
Cons: Index file must be pre-allocated or dynamically grown, wasted space for gaps

2. File Rotation Strategy with Sparse Writes

Geth uses 2GB file threshold and appends sequentially. Lighthouse has two pointers writing to different slot ranges.

Question: When do we rotate to a new data file?

Challenges:

  • Can't use simple "2GB threshold" if writing backward
  • Reconstruction might write slot 1000, then slot 999, then slot 998...
  • Finalization writes slot 100000, 100001, 100002...

Options:

Option A: Fixed slot ranges per file

data_0000.dat: slots 0-999,999
data_0001.dat: slots 1,000,000-1,999,999
data_0002.dat: slots 2,000,000-2,999,999

Pros: Deterministic file selection, supports sparse writes
Cons: May have many small files if data is sparse, files aren't uniform size

3. Migration Strategy for Existing Archive Nodes

Existing archive nodes have 100s of GB of data in LevelDB.

Question: What's the migration user experience?

Options:

Option A: Require re-sync

Any migration will take a long time to perform. To keep complexity minimal we can support both approaches for sometime and ask users to resync at some point.

Which columns to put in files?

Slot-Indexed Columns Static Files

BeaconBlockRoots (bbx): slot block root (32 bytes fixed) - monotonic keys

BeaconStateRoots (bsx): slot state root (32 bytes fixed) - identical to block roots

BeaconStateSnapshot (bsn): slot compressed state - large data, sparse (every 2^21 slots)

BeaconStateDiff (bsd): state HDiffs

BeaconBlock (blk): beacon blocks

BeaconBlob (blb): blob sidecars

BeaconDataColumn (bdc): (block root, column index) data column - same as blobs

Keep in LevelDB (Small Metadata)

BeaconColdStateSummary (bcs): - tiny metadata for hash lookups

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions