Description
Description
Lighthouse's mechanism for recovering from missed hard forks is currently broken. It is implemented in the fork_revert
module, but hasn't worked since Altair.
Part of the reason for this is that it is hard to test comprehensively. The ideal test depends on access to prior versions of Lighthouse which aren't available on CI (yet). The test looks something like this:
- Run a testnet with two types of nodes:
- Canonical chain: latest Lighthouse version, and fork epoch set for the latest hard fork (e.g. Capella).
- Stale chain: previous Lighthouse version and no fork epoch set.
- Wait until the testnet has advanced past the configured fork epoch. The canonical chain should continue (and finalize) with new blocks, while the stale chain also continues. The nodes will likely disconnect from each other on P2P.
- Shut down all the stale nodes and restart them with the latest version of Lighthouse & with the fork epoch configured. Ensure that they don't crash on startup and sync back up to the canonical chain.
There's a lot of local testnet infra described here which is currently not really up to scratch. We likely need changes like #3807 to land first so we can test these scenarios.
There's also a more minor guarantee that we can test on CI without access to prior versions, which is that the current version of Lighthouse can revert a fork missed by the same version. We could likely test this as a beacon_chain test and it would get us ~50% of the way towards a more robust fork_revert
module.
TODO
- Fix fork_revert logic
- Basic test (single version)
- Comprehensive test (multiple versions)