-
Notifications
You must be signed in to change notification settings - Fork 961
Description
Potential idea to migrate Lighthouse's cold DB (freezer) from LevelDB to Geth-style static files to eliminate compaction bottlenecks in archive nodes. Based on Geth's freezer design with O(1) read/write performance.
Potential idea to speed up archive node sync and reduce their I/O. However, we would need to rethink Lighthouse strategy around reconstruction and how archive nodes sync.
Critical Architectural Questions
1. Dual Write Pointers: Forward Finalization + Reconstruction
On Checkpoint sync, the node starts from a finalized checkpoint and has two concurrent processes:
- Finalization pointer: Writes forward from checkpoint slot (newest finalized data)
- Reconstruction pointer: Backfills historical data backward toward genesis
Write pointers are in disjoint slot ranges (always < and >= checkpoint slot). They meet in the middle when backfill completes
Question: How should static files handle sparse writes across disjoint ranges?
Options:
Option A: Abolish background reconstruction
To populate an archival node, force users to sync from genesis to some trusted checkpoint. The node won't be able to participate in duties until it has fully synced, but that is the normal UX for execution nodes anyway. We can download headers first like in tree-sync against a trusted checkpoint to make genesis sync safe.
If users checkpoint sync they can only populate archive data from the anchor slot.
This way we have a single pointer moving forward writing archival data at any time.
Option B: Single continuous index file with sparse slots
Index file: slot 0 ‚Üí slot N (with gaps)
- Finalization writes at slots: checkpoint_slot, checkpoint_slot+1, ...
- Reconstruction writes at slots: checkpoint_slot-1, checkpoint_slot-2, ..., 0
- Index entries for missing slots: special marker (e.g., file_number = 0xFFFFFFFF)
Pros: Simple model, single index file
Cons: Index file must be pre-allocated or dynamically grown, wasted space for gaps
2. File Rotation Strategy with Sparse Writes
Geth uses 2GB file threshold and appends sequentially. Lighthouse has two pointers writing to different slot ranges.
Question: When do we rotate to a new data file?
Challenges:
- Can't use simple "2GB threshold" if writing backward
- Reconstruction might write slot 1000, then slot 999, then slot 998...
- Finalization writes slot 100000, 100001, 100002...
Options:
Option A: Fixed slot ranges per file
data_0000.dat: slots 0-999,999
data_0001.dat: slots 1,000,000-1,999,999
data_0002.dat: slots 2,000,000-2,999,999
Pros: Deterministic file selection, supports sparse writes
Cons: May have many small files if data is sparse, files aren't uniform size
3. Migration Strategy for Existing Archive Nodes
Existing archive nodes have 100s of GB of data in LevelDB.
Question: What's the migration user experience?
Options:
Option A: Require re-sync
Any migration will take a long time to perform. To keep complexity minimal we can support both approaches for sometime and ask users to resync at some point.
Which columns to put in files?
Slot-Indexed Columns Static Files
BeaconBlockRoots (bbx): slot block root (32 bytes fixed) - monotonic keys
BeaconStateRoots (bsx): slot state root (32 bytes fixed) - identical to block roots
BeaconStateSnapshot (bsn): slot compressed state - large data, sparse (every 2^21 slots)
BeaconStateDiff (bsd): state HDiffs
BeaconBlock (blk): beacon blocks
BeaconBlob (blb): blob sidecars
BeaconDataColumn (bdc): (block root, column index) data column - same as blobs
Keep in LevelDB (Small Metadata)
BeaconColdStateSummary (bcs): - tiny metadata for hash lookups