Skip to content

Conversation

Tristan-Wilson
Copy link
Member

This change eliminates a circular dependency between the consensus and
execution layers by transforming the sync status flow from a pull-based
to a push-based model. Previously, the execution layer would query the
consensus layer for sync status through the ConsensusInfo interface,
creating a tight coupling between the layers.

The new architecture introduces a ConsensusSyncData structure that
contains sync status, target message count, and progress information.
The ConsensusExecutionSyncer now periodically pushes this data from
consensus to execution, where it's stored using an atomic pointer for
lock-free reads. This approach maintains consistency with the existing
finality data push mechanism and provides better performance through
reduced lock contention.

As part of this refactoring, the ConsensusInfo interface has been
simplified to only include the BlockMetadataAtMessageIndex method,
removing the now-redundant Synced, FullSyncProgressMap, and
SyncTargetMessageCount methods. This cleaner separation of concerns
better supports alternative client implementations by clearly defining
the data flow boundaries between consensus and execution layers.

Fixes NIT-3649

This change eliminates a circular dependency between the consensus and
execution layers by transforming the sync status flow from a pull-based
to a push-based model. Previously, the execution layer would query the
consensus layer for sync status through the ConsensusInfo interface,
creating a tight coupling between the layers.

The new architecture introduces a ConsensusSyncData structure that
contains sync status, target message count, and progress information.
The ConsensusExecutionSyncer now periodically pushes this data from
consensus to execution, where it's stored using an atomic pointer for
lock-free reads. This approach maintains consistency with the existing
finality data push mechanism and provides better performance through
reduced lock contention.

As part of this refactoring, the ConsensusInfo interface has been
simplified to only include the BlockMetadataAtMessageIndex method,
removing the now-redundant Synced, FullSyncProgressMap, and
SyncTargetMessageCount methods. This cleaner separation of concerns
better supports alternative client implementations by clearly defining
the data flow boundaries between consensus and execution layers.
Also lower the sync interval for tests.
syncData := &execution.ConsensusSyncData{
Synced: c.syncMonitor.Synced(),
SyncTargetMessageCount: c.syncMonitor.SyncTargetMessageCount(),
SyncProgressMap: c.syncMonitor.FullSyncProgressMap(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SyncProgressMap should be nil or empty if Synced (prevents wasteful locking/etc)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in latest commit.

Tristan-Wilson and others added 3 commits August 29, 2025 14:25
Only populate SyncProgressMap when not synced.

MaxMessageCount is now a dedicated field that's always sent.

Fix stale sync targets caused by push delay. Instead of consensus
sending pre-calculated targets, it now sends raw MaxMessageCount.
Execution maintains a sliding window history and calculates its own
target using values from 1-2 MsgLag ago, properly accounting for the
push delay. The default push interval and execution message lag are both
1 second so they work together well.

Includes unit tests for the sliding window implementation.
Copy link

codecov bot commented Sep 5, 2025

Codecov Report

❌ Patch coverage is 91.01124% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 22.80%. Comparing base (cb86fca) to head (4756bfd).
⚠️ Report is 14 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3538      +/-   ##
==========================================
+ Coverage   22.70%   22.80%   +0.09%     
==========================================
  Files         388      388              
  Lines       58900    59016     +116     
==========================================
+ Hits        13375    13456      +81     
- Misses      43486    43517      +31     
- Partials     2039     2043       +4     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tsahee tsahee added the after-next-version This PR shouldn't be merged until after the next version is released label Sep 8, 2025
windowEnd := now.Add(-h.msgLag)

for _, entry := range h.entries {
if !entry.timestamp.Before(windowStart) && !entry.timestamp.After(windowEnd) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I look at the current description of msgLag - I think what we actually want is the oldest message that's less then MsgLag old (2*MsgLag is not relevant).
We can discard anything that's more then MsgLag old.
We can do other things with different documentation for MsgLag - but this method (which is different from what I said before) seems to fit current documentation and be simple enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated it to get the oldest message that is newer than msgLag old.


// Add the max message count to history for sync target calculation
if syncData != nil && syncData.MaxMessageCount > 0 {
s.syncHistory.add(syncData.MaxMessageCount, syncData.UpdatedAt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use the minimum of (syncData.UpdatedAt, time.Now()) for time, so if times between components don't match we at least know timestamp is not in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I set time.Now to be the floor

}

// Always add the max message count
res["maxMessageCount"] = data.MaxMessageCount
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comes from consensus so let's call it "consensusMaxMessageCount" or something"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to "consensusMaxMessageCount"

defer h.mutex.RUnlock()

if len(h.entries) == 0 {
return 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not entirely certain - but I think in this case it's better to return an error and not 0, to make sure nothing makes the mistake to think we're in sync

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay because we return not synced if it's zero:

func (s *SyncMonitor) Synced(ctx context.Context) bool {
...
	// Calculate the sync target based on historical data
	syncTarget := s.syncHistory.getSyncTarget(now)
	if syncTarget == 0 {
		// No valid sync target available yet
		return false
	}
...

@tsahee tsahee removed the after-next-version This PR shouldn't be merged until after the next version is released label Sep 22, 2025
@tsahee tsahee assigned Tristan-Wilson and unassigned tsahee Sep 22, 2025
Tristan-Wilson and others added 4 commits September 23, 2025 12:51
- Simplify sync target to use oldest entry < msgLag ago (not 2*msgLag window)
- Use min(now, syncData.UpdatedAt) to prevent future timestamps
- Rename maxMessageCount to consensusMaxMessageCount for clarity
- Update tests to match new msgLag-based trimming behavior
…nc-info

Fix minor conflict around using pflag instead of flag.
@Tristan-Wilson
Copy link
Member Author

Tests are passing now after merging in latest master, assigning back to Tsahi for review.

Copy link
Contributor

@tsahee tsahee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tine canges to a constant

var DefaultSyncMonitorConfig = SyncMonitorConfig{
SafeBlockWaitForBlockValidator: false,
FinalizedBlockWaitForBlockValidator: false,
MsgLag: time.Second,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be at least 2 * ConsensusExecutionSyncer.sync-interval, and preferrably even 3x.
I'd kieep this one a second and reduce the other to 300ms.

We should also have a check somewhere that prints a warning if the sync-interval is more than msg-lag / 2.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

attempt++
}

// TODO: Use Client.SyncProgressMap to see the full map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ethclient calling sync will internally call SyncProgressMap.
see: func (api *EthereumAPI) Syncing(ctx context.Context) in go-ethereum/internal/ethapi/api.go

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the TODO.

Copy link

github-actions bot commented Oct 3, 2025

❌ 10 Tests Failed:

Tests completed Failed Passed Skipped
2127 10 2117 0
View the top 3 failed tests by shortest run time
TestTimeboostBulkBlockMetadataAPI
Stack Traces | 2.880s run time
... [CONTENT TRUNCATED: Keeping last 20 lines]
INFO [10-03|11:01:01.440] Persisted trie from memory database      nodes=19  flushnodes=0 size=3.36KiB   flushsize=0.00B time="91.111µs"  flushtime=0s gcnodes=0    gcsize=0.00B      gctime=0s          livenodes=193   livesize=32.97KiB
INFO [10-03|11:01:01.440] Writing cached state to disk             block=2    hash=ab605b..a98115 root=b74c82..7fa0b7
INFO [10-03|11:01:01.440] Persisted trie from memory database      nodes=19  flushnodes=0 size=3.14KiB   flushsize=0.00B time="103.635µs" flushtime=0s gcnodes=0    gcsize=0.00B      gctime=0s          livenodes=174   livesize=29.83KiB
INFO [10-03|11:01:01.440] Writing snapshot state to disk           root=9dfbf2..cb68b4
INFO [10-03|11:01:01.440] Persisted trie from memory database      nodes=0   flushnodes=0 size=0.00B     flushsize=0.00B time=772ns       flushtime=0s gcnodes=0    gcsize=0.00B      gctime=0s          livenodes=174   livesize=29.83KiB
DEBUG[10-03|11:01:01.440] Dereferenced trie from memory database   nodes=0   size=0.00B     time=962ns       gcnodes=0    gcsize=0.00B      gctime=450ns       livenodes=174   livesize=29.83KiB
DEBUG[10-03|11:01:01.440] Dereferenced trie from memory database   nodes=19  size=3.17KiB   time="63.809µs"  gcnodes=19   gcsize=3.17KiB    gctime="64.149µs"  livenodes=155   livesize=26.66KiB
DEBUG[10-03|11:01:01.440] Dereferenced trie from memory database   nodes=19  size=3.20KiB   time="42.139µs"  gcnodes=38   gcsize=6.37KiB    gctime="106.158µs" livenodes=136   livesize=23.46KiB
DEBUG[10-03|11:01:01.440] Dereferenced trie from memory database   nodes=19  size=3.23KiB   time="34.004µs"  gcnodes=57   gcsize=9.60KiB    gctime="140.052µs" livenodes=117   livesize=20.23KiB
DEBUG[10-03|11:01:01.440] Dereferenced trie from memory database   nodes=19  size=3.26KiB   time="34.675µs"  gcnodes=76   gcsize=12.87KiB   gctime="174.647µs" livenodes=98    livesize=16.96KiB
DEBUG[10-03|11:01:01.440] Dereferenced trie from memory database   nodes=19  size=3.29KiB   time="40.415µs"  gcnodes=95   gcsize=16.16KiB   gctime="214.962µs" livenodes=79    livesize=13.67KiB
DEBUG[10-03|11:01:01.441] Dereferenced trie from memory database   nodes=20  size=3.43KiB   time="41.838µs"  gcnodes=115  gcsize=19.59KiB   gctime="256.7µs"   livenodes=59    livesize=10.24KiB
DEBUG[10-03|11:01:01.441] Dereferenced trie from memory database   nodes=19  size=3.33KiB   time="49.783µs"  gcnodes=134  gcsize=22.92KiB   gctime="306.393µs" livenodes=40    livesize=6.91KiB
DEBUG[10-03|11:01:01.441] Dereferenced trie from memory database   nodes=20  size=3.46KiB   time="34.654µs"  gcnodes=154  gcsize=26.37KiB   gctime="340.957µs" livenodes=20    livesize=3.46KiB
DEBUG[10-03|11:01:01.441] Dereferenced trie from memory database   nodes=20  size=3.46KiB   time="35.777µs"  gcnodes=174  gcsize=29.83KiB   gctime="376.644µs" livenodes=0     livesize=0.00B
DEBUG[10-03|11:01:01.441] Dereferenced trie from memory database   nodes=0   size=0.00B     time=150ns       gcnodes=174  gcsize=29.83KiB   gctime="376.724µs" livenodes=0     livesize=0.00B
DEBUG[10-03|11:01:01.441] Dereferenced trie from memory database   nodes=0   size=0.00B     time=120ns       gcnodes=174  gcsize=29.83KiB   gctime="376.764µs" livenodes=0     livesize=0.00B
INFO [10-03|11:01:01.441] Blockchain stopped
TRACE[10-03|11:01:01.458] P2P networking is spinning down
--- FAIL: TestTimeboostBulkBlockMetadataAPI (2.88s)
TestVersion30
Stack Traces | 7.950s run time
... [CONTENT TRUNCATED: Keeping last 20 lines]
=== PAUSE TestVersion30
=== CONT  TestVersion30
    precompile_inclusion_test.go:90: goroutine 558836 [running]:
        runtime/debug.Stack()
        	/opt/hostedtoolcache/go/1.25.1/x64/src/runtime/debug/stack.go:26 +0x5e
        github.com/offchainlabs/nitro/util/testhelpers.RequireImpl({0x4078ab0, 0xc05c97a700}, {0x4036e80, 0xc153895650}, {0x0, 0x0, 0x0})
        	/home/runner/work/nitro/nitro/util/testhelpers/testhelpers.go:29 +0x55
        github.com/offchainlabs/nitro/system_tests.Require(0xc05c97a700, {0x4036e80, 0xc153895650}, {0x0, 0x0, 0x0})
        	/home/runner/work/nitro/nitro/system_tests/common_test.go:1722 +0x5d
        github.com/offchainlabs/nitro/system_tests.testPrecompiles(0xc05c97a700, 0x1e, {0xc0daf35db0, 0x6, 0x0?})
        	/home/runner/work/nitro/nitro/system_tests/precompile_inclusion_test.go:90 +0x371
        github.com/offchainlabs/nitro/system_tests.TestVersion30(0xc05c97a700?)
        	/home/runner/work/nitro/nitro/system_tests/precompile_inclusion_test.go:67 +0x798
        testing.tRunner(0xc05c97a700, 0x3cc1da0)
        	/opt/hostedtoolcache/go/1.25.1/x64/src/testing/testing.go:1934 +0xea
        created by testing.(*T).Run in goroutine 1
        	/opt/hostedtoolcache/go/1.25.1/x64/src/testing/testing.go:1997 +0x465
        
    precompile_inclusion_test.go:90: �[31;1m [] execution aborted (timeout = 5s) �[0;0m
--- FAIL: TestVersion30 (7.95s)
TestConsumeNextBid_Direct
Stack Traces | 15.080s run time
... [CONTENT TRUNCATED: Keeping last 20 lines]
INFO [10-03|11:14:08.799] Starting work on payload                 id=0x038ae1b18f741a52
INFO [10-03|11:14:08.800] Updated payload                          id=0x038ae1b18f741a52 number=15 hash=c8a7dc..2147e6 txs=1 withdrawals=0 gas=979,249   fees=9.79249e-07 root=6257e9..7da0e0 elapsed="895.308µs"
INFO [10-03|11:14:08.800] Stopping work on payload                 id=0x038ae1b18f741a52 reason=delivery
INFO [10-03|11:14:08.801] Imported new potential chain segment     number=15 hash=c8a7dc..2147e6 blocks=1 txs=1 mgas=0.979 elapsed=1.303ms     mgasps=751.334  triediffs=37.52KiB triedirty=0.00B
INFO [10-03|11:14:08.801] Chain head was updated                   number=15 hash=c8a7dc..2147e6 root=6257e9..7da0e0 elapsed="85.2µs"
    auctioneer_bid_consumption_test.go:187: 
        	Error Trace:	/home/runner/work/nitro/nitro/timeboost/auctioneer_bid_consumption_test.go:187
        	Error:      	Received unexpected error:
        	            	invalid length, need 256 bits
        	            	opening wallet
        	            	github.com/offchainlabs/nitro/timeboost.NewAuctioneerServer
        	            		/home/runner/work/nitro/nitro/timeboost/auctioneer.go:207
        	            	github.com/offchainlabs/nitro/timeboost.TestConsumeNextBid_Direct
        	            		/home/runner/work/nitro/nitro/timeboost/auctioneer_bid_consumption_test.go:186
        	            	testing.tRunner
        	            		/opt/hostedtoolcache/go/1.25.1/x64/src/testing/testing.go:1934
        	            	runtime.goexit
        	            		/opt/hostedtoolcache/go/1.25.1/x64/src/runtime/asm_amd64.s:1693
        	Test:       	TestConsumeNextBid_Direct
--- FAIL: TestConsumeNextBid_Direct (15.08s)

📣 Thoughts on this report? Let Codecov know! | Powered by Codecov

@tsahee tsahee added this pull request to the merge queue Oct 3, 2025
Merged via the queue into master with commit 52a6bd8 Oct 3, 2025
21 checks passed
@tsahee tsahee deleted the consensus-pushes-sync-info branch October 3, 2025 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants