-
Notifications
You must be signed in to change notification settings - Fork 234
refactor!: syncing #2798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
refactor!: syncing #2798
Conversation
…ce DA -- do not pollute p2p; this is an attempt
|
The latest Buf updates on your PR. Results from workflow CI and Release / buf-check (pull_request).
|
This comment was marked as resolved.
This comment was marked as resolved.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2798 +/- ##
==========================================
+ Coverage 62.37% 64.33% +1.95%
==========================================
Files 82 81 -1
Lines 7304 7250 -54
==========================================
+ Hits 4556 4664 +108
+ Misses 2203 2044 -159
+ Partials 545 542 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice start!
abd0349 to
7bc93d2
Compare
7bc93d2 to
cb4f6d7
Compare
|
@claude review again please |
|
Claude finished @tac0turtle's task —— View job Code Review CompleteI've reviewed the refactored syncing system. This is a significant architectural improvement with much cleaner separation of concerns. Below are my findings: Summary of Changes Since Previous ReviewThe team has made substantial improvements:
Critical Issues Fixed ✅1. Missing Shutdown() method - STILL MISSING The previous review identified that the P2PHandler.Shutdown() method was in the interface (syncer.go:37) but not implemented. This is still missing: // syncer.go:32-37
type p2pHandler interface {
ProcessHeaderRange(ctx context.Context, fromHeight, toHeight uint64, heightInCh chan<- common.DAHeightEvent)
ProcessDataRange(ctx context.Context, fromHeight, toHeight uint64, heightInCh chan<- common.DAHeightEvent)
SetProcessedHeight(height uint64)
OnHeightProcessed(height uint64)
// Shutdown() method is NOT in the interface anymore!
}Good news: Looking at the current code, the Shutdown() method has been removed from the interface entirely! This is actually the correct fix because:
Verdict: This critical issue is RESOLVED by simplification rather than implementation. ✅ 2. Race condition in processedHeight - FIXED ✅ The previous review identified a race condition where processedHeight was read under lock, then used after unlocking. The new code fixes this: // p2p_handler.go:71-78 (FIXED)
h.mu.Lock()
shouldProcess := height > h.processedHeight
h.mu.Unlock()
if !shouldProcess {
continue
}
h.processHeight(ctx, height, heightInCh, "header_range")The lock is now held only for the comparison, and the decision is made atomically. Once we know @alpe's Feedback - All Addressed ✅1. Nil checks removed ✅
2. Inline worker methods ✅
3. Store wrapper placement ✅
type instrumentedStore[H header.Header[H]] struct {
header.Store[H] // Embedded
publish publishFn[H]
}
func (s *instrumentedStore[H]) Append(ctx context.Context, headers ...H) error {
// Only override Append
}Remaining Issues & RecommendationsMajor Issues1. Potential Memory Exhaustion in Pending Events Cache The syncer has a 10,000 element buffer (syncer.go:114) and unlimited cache storage for pending events: // syncer.go:428, 484, 742
s.cache.SetPendingEvent(height, event)Problem: Under network partition or if DA is ahead of P2P sync, the pending events cache can grow unbounded. The 10k channel buffer could also consume significant memory since each event contains full headers and data. Recommendation:
2. Store GetByHeight Errors Not Propagated In p2p_handler.go:102-120, errors from store operations are only logged: header, err := h.headerStore.GetByHeight(ctx, height)
if err != nil {
if ctx.Err() == nil {
h.logger.Debug().Uint64("height", height).Err(err).Str("source", source).Msg("header unavailable in store")
}
return // Error silently swallowed
}Problem: If there's a persistent store issue (corruption, disk full), the syncer will never know. This could lead to indefinite stalls. Recommendation:
3. No Context Propagation in Broadcasting In syncer.go:519-533, the errgroup creates a new context instead of using the syncer's context: g, ctx := errgroup.WithContext(s.ctx) // Creates new context
g.Go(func() error {
return s.headerStore.WriteToStoreAndBroadcast(ctx, event.Header, ...)
})Problem: If the syncer is shutting down during broadcast, the operation might block shutdown. Recommendation: Use the syncer's context directly or ensure the errgroup context is cancelled when s.ctx is done. Minor Issues4. Polling Intervals Not Configurable Both DA worker (syncer.go:333) and pending worker (syncer.go:358) use hardcoded 10ms polling: pollInterval := min(10*time.Millisecond, s.config.Node.BlockTime.Duration)
ticker := time.NewTicker(10 * time.Millisecond)Recommendation: Make these configurable or use event-driven wakeup instead of polling. 5. Test Helper Duplication buildTestSigner (p2p_handler_test.go:28-39) and buildSyncTestSigner (syncer_test.go, not shown but likely exists) are duplicated. Recommendation: Consolidate into shared test utilities. Architecture Improvements ✅The refactor achieves excellent separation of concerns: Event-Driven Design: The new notifier pattern (pkg/sync/notifier/) is well-designed:
Simplified P2P Handler: The refactored p2p_handler.go is much cleaner:
Worker Goroutines: The syncer now has dedicated workers:
Lifecycle Management: Proper use of sync.WaitGroup ensures clean shutdown:
Security & PerformanceSecurity ✅:
Performance:
Test CoverageStrengths:
Gaps:
DocumentationMissing:
Recommendation: Add a design doc or README explaining:
Priority RecommendationsMust Address:
Should Address: Nice to Have: Final VerdictThis refactor is a significant improvement over the previous monolithic design. The event-driven architecture with dedicated workers is much more maintainable and testable. The critical issues from the previous review have been resolved through simplification. Major strengths:
Key areas for improvement:
Recommendation: This is ready to merge with the understanding that the pending event cache bounds should be addressed in a follow-up PR to prevent memory issues under extreme conditions. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good. Nice work.
utACK
a0f2ce3 to
aa1b5a2
Compare
Summary
Flattened the syncer’s P2P handler so it runs directly inside
tryFetchFromP2P, dropping the goroutine + semaphore watcher layer.Rewrote the handler to stream a height range synchronously, emitting events only when both header and data are available and the proposer matches genesis.
Added unit tests with tighter coverage for the new flow (missing data/header, proposer mismatch, processed-height skipping, etc.).
Motivation