-
Notifications
You must be signed in to change notification settings - Fork 877
Add genesis sync test to CI #7561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: unstable
Are you sure you want to change the base?
Conversation
…ighthouse into add-genesis-sync-test
This PR is ready for review. In this test, we use the The rebuild image For 120s stop duration, we trigger short distance sync, the lookup sync. We see in the logs:
For 300s stop duration, we trigger long distance sync, the range sync. We see in the logs:
All sync completed very fast, under 5s. So I reduce the polling time in the script to 0.5s so that we can see some outputs in the terminal, and set a shorter timeout of 5 minutes. |
…ighthouse into add-genesis-sync-test
# ------------------------------------------------------ | ||
# Interval for polling the /lighthouse/syncing endpoint for sync status | ||
# Reduce the polling time so that some progress can be seen | ||
POLL_INTERVAL_SECS=0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very frequent, I think it's not necessary and we could probably live with something like 5s, or 6s (1 poll per slot)? Don't forgot HTTP API requests goes through BeaconProcessor
as well and it's Priority::P0
, which means it will be prioritised over most tasks.
lighthouse/beacon_node/http_api/src/lib.rs
Line 4384 in 7e2df6b
task_spawner.blocking_json_task(Priority::P0, move || { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, the sync completed very fast, like under 2s, e.g., for sync-electra-300s
, we see from the logs for cl-3-lighthouse-geth
:
Jun 06 06:33:34.585 DEBUG Range request block roots retrieved req_type: "BlocksByRange", start_slot: 8, req_count: 8, roots_count: 8, source: "store", elapsed: 89.035µs, finalized_slot: 32
Jun 06 06:33:34.774 DEBUG Range request block roots retrieved req_type: "BlocksByRange", start_slot: 16, req_count: 8, roots_count: 8, source: "store", elapsed: 40.275µs, finalized_slot: 32
Jun 06 06:33:34.774 DEBUG Range request block roots retrieved req_type: "BlocksByRange", start_slot: 24, req_count: 8, roots_count: 8, source: "store", elapsed: 61.705µs, finalized_slot: 32
and the sync is completed about 0.2-0.3 second later:
Jun 06 06:33:35.000 INFO Synced peers: "3", exec_hash: "0xa95848dca51fbedfce6f84bd2d7f2df0b459a5672c8c8fb118ad44b321170f2d (unverified)", finalized_root: 0x57274ba74a8a091caa527a1af01269489c333da085503a0ae76d3534ecbe03d0, finalized_epoch: 4, epoch: 6, block: "0x2fb201cc9202c794b31212e49d8c828274fa76669db0453993bd8650a19dad10", slot: 52
So if we poll every 5s, we may not see any output, unless we increase the sync distance (stop the node for longer periods of time
echo "${node_type} status: ${status:-Unknown}" | ||
|
||
# The test is complete when the node is synced | ||
if [ "$status" = "Synced" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the "Synced" check in the checkpoint sync script, because of the issue we talked about on the call - there's a bug and LH may switch to Synced
even though it's not.
But I guess for forward sync this is fine, as it should only switch to this status if we're within certain distance. If this doesn' work well, we could consider using the standard node/syncing
API, as it also returns the sync distance, which provides a bit more certainty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the sync-electra-300s
logs, and realised it didn't even start syncing, i suspect LH is reporting Synced
status too early. Would it be worth looking at the sync distance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For forward sync it looks ok? This is the output for sync-electra-300s
:
https://github.com/sigp/lighthouse/actions/runs/15484117320/job/43595585222?pr=7561
We can increase the poll time to 1-2s if that's better?
I have also tested using the beacon API /eth/v1/node/syncing
, the output could look like this (also with 0.5s poll time):
olling sync status until nodes are synced or timeout of 5 mins
supernode status: Synced (head_slot: 51, sync_distance: 0)
supernode completed sync in 1 seconds
fullnode status: No response or null response
fullnode status: No response or null response
fullnode status: No response or null response
fullnode status: Syncing (head_slot: 23, sync_distance: 28, is_optimistic: true, el_offline: false)
fullnode status: Synced (head_slot: 51, sync_distance: 0)
fullnode completed sync in 3 seconds
Genesis sync test complete! Both supernode and fullnode have synced successfully.
Supernode time: 1 seconds
Fullnode time: 3 seconds
It has the head_slot
(which I believe is the same as target_slot
in the lighthouse/syncing
endpoint (the beacon API also has the some other info in the response like el_offline
and is_optimistic
). If the beacon API is better, I can switch to that
Issue Addressed
Use existing code from @jimmygchen in #7530 and modify for genesis sync test. Thanks @jimmygchen !