Static File Backend for Lighthouse Cold DB

Potential idea to migrate Lighthouse's cold DB (freezer) from LevelDB to Geth-style static files to eliminate compaction bottlenecks in archive nodes. Based on Geth's freezer design with O(1) read/write performance.

Potential idea to speed up archive node sync and reduce their I/O. However, we would need to rethink Lighthouse strategy around reconstruction and how archive nodes sync.

CC @michaelsproul 


## Critical Architectural Questions

### 1. **Dual Write Pointers: Forward Finalization + Reconstruction**

On **Checkpoint sync**, the node starts from a finalized checkpoint and has two concurrent processes:
  - **Finalization pointer**: Writes forward from checkpoint slot (newest finalized data)
  - **Reconstruction pointer**: Backfills historical data backward toward genesis

Write pointers are in disjoint slot ranges (always < and >= checkpoint slot). **They meet in the middle** when backfill completes

**Question**: How should static files handle sparse writes across disjoint ranges?

**Options**:

**Option A: Abolish background reconstruction**

To populate an archival node, force users to sync from genesis to some trusted checkpoint. The node won't be able to participate in duties until it has fully synced, but that is the normal UX for execution nodes anyway. We can download headers first like in tree-sync against a trusted checkpoint to make genesis sync safe.

If users checkpoint sync they can only populate archive data from the anchor slot.

This way we have a single pointer moving forward writing archival data at any time.

**Option B: Single continuous index file with sparse slots**
```
Index file: slot 0 ‚Üí slot N (with gaps)
- Finalization writes at slots: checkpoint_slot, checkpoint_slot+1, ...
- Reconstruction writes at slots: checkpoint_slot-1, checkpoint_slot-2, ..., 0
- Index entries for missing slots: special marker (e.g., file_number = 0xFFFFFFFF)
```
*Pros*: Simple model, single index file
*Cons*: Index file must be pre-allocated or dynamically grown, wasted space for gaps




### 2. **File Rotation Strategy with Sparse Writes**

Geth uses 2GB file threshold and appends sequentially. Lighthouse has two pointers writing to different slot ranges.

**Question**: When do we rotate to a new data file?

**Challenges**:
- Can't use simple "2GB threshold" if writing backward
- Reconstruction might write slot 1000, then slot 999, then slot 998...
- Finalization writes slot 100000, 100001, 100002...

**Options**:

**Option A: Fixed slot ranges per file**
```
data_0000.dat: slots 0-999,999
data_0001.dat: slots 1,000,000-1,999,999
data_0002.dat: slots 2,000,000-2,999,999
```
*Pros*: Deterministic file selection, supports sparse writes
*Cons*: May have many small files if data is sparse, files aren't uniform size




### 3. **Migration Strategy for Existing Archive Nodes**

Existing archive nodes have 100s of GB of data in LevelDB.

**Question**: What's the migration user experience?

**Options**:

**Option A: Require re-sync**

Any migration will take a long time to perform. To keep complexity minimal we can support both approaches for sometime and ask users to resync at some point.


## Which columns to put in files?

### Slot-Indexed Columns Static Files

**BeaconBlockRoots (bbx)**: slot block root (32 bytes fixed) - monotonic keys

**BeaconStateRoots (bsx)**: slot state root (32 bytes fixed) - identical to block roots

**BeaconStateSnapshot (bsn)**: slot compressed state - large data, sparse (every 2^21 slots)

**BeaconStateDiff (bsd)**: state HDiffs

**BeaconBlock (blk)**: beacon blocks

**BeaconBlob (blb)**: blob sidecars

**BeaconDataColumn (bdc)**: (block root, column index) data column - same as blobs

### Keep in LevelDB (Small Metadata)

**BeaconColdStateSummary (bcs)**: - tiny metadata for hash lookups



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Static File Backend for Lighthouse Cold DB #8634

Critical Architectural Questions

1. Dual Write Pointers: Forward Finalization + Reconstruction

2. File Rotation Strategy with Sparse Writes

3. Migration Strategy for Existing Archive Nodes

Which columns to put in files?

Slot-Indexed Columns Static Files

Keep in LevelDB (Small Metadata)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Static File Backend for Lighthouse Cold DB #8634

Description

Critical Architectural Questions

1. Dual Write Pointers: Forward Finalization + Reconstruction

2. File Rotation Strategy with Sparse Writes

3. Migration Strategy for Existing Archive Nodes

Which columns to put in files?

Slot-Indexed Columns Static Files

Keep in LevelDB (Small Metadata)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions