-
Notifications
You must be signed in to change notification settings - Fork 234
refactor(reaping): use cache for seen hashes instead of in memory store #2811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest Buf updates on your PR. Results from workflow CI and Release / buf-check (pull_request).
|
|
Claude finished @julienrbrt's task —— View job Code Review CompleteI've completed a comprehensive review of this PR. Overall, this is a solid architectural improvement that centralizes transaction hash tracking and enables persistence. The code quality is good, with excellent test coverage for the new functionality. However, I've identified several areas for improvement and one critical issue that needs attention. SummaryWhat Changed:
Impact:
Critical Issues1.
|
| Approach | Pros | Cons |
|---|---|---|
| Persist to disk (current) | Prevents duplicate tx submissions across restarts; consistent with other cache behavior | Extra I/O on shutdown; minimal benefit if restarts are rare |
| In-memory only | No I/O overhead; simpler | Transactions seen before restart may be resubmitted (though sequencer should handle this) |
Recommendation: I lean toward @tac0turtle's suggestion for in-memory only, but with caveats:
- The sequencer should already handle duplicate transactions (idempotency)
- The 24-hour window is long enough that restart-related duplicates are unlikely to cause issues
- Consistency argument: If this is truly ephemeral data, why persist it?
However, if you keep persistence:
- The current implementation is correct
- The I/O happens only on graceful shutdown (not hot path)
- Consistency with other caches (header, data) is maintained
Suggested Action: Decide explicitly and document the rationale. If going in-memory only:
- Remove
txCachefromSaveToDisk()andLoadFromDisk() - Update the comment at line 331-333 to explain why
2. Missing Cleanup on Shutdown
Location: block/internal/reaping/reaper.go:143-150, Stop() method
The reaper's Stop() method doesn't save the cache to disk. If persistence is desired, you should:
func (r *Reaper) Stop() error {
if r.cancel != nil {
r.cancel()
}
r.wg.Wait()
// Save cache to disk on graceful shutdown
if err := r.cache.SaveToDisk(); err != nil {
r.logger.Warn().Err(err).Msg("failed to save cache on shutdown")
}
r.logger.Info().Msg("reaper stopped")
return nil
}Note: This is only relevant if keeping disk persistence.
3. Timestamp Inconsistency on Load
Location: block/internal/cache/manager.go:360-368, LoadFromDisk() method
When loading from disk, ALL transaction timestamps are reset to time.Now():
now := time.Now()
m.txCache.hashes.Range(func(key, value any) bool {
if hash, ok := key.(string); ok {
m.txTimestamps.Store(hash, now)
}
return true
})Issue: This defeats the purpose of time-based cleanup. A transaction that was 23 hours old at shutdown will be treated as "new" on restart, extending its lifetime to 47 hours total.
Options:
- Persist timestamps (complex, requires gob registration of time.Time, more I/O)
- Use conservative timestamp on load (e.g.,
time.Now().Add(-23 * time.Hour)to ensure old txs are cleaned up soon) - Accept the behavior as reasonable (document it)
- Go in-memory only (removes the problem entirely)
Recommendation: If keeping persistence, document this behavior explicitly. If it's a problem, persist timestamps or go in-memory only.
Code Quality & Go Idioms
✅ Good Practices
- Error Wrapping: Consistent use of
fmt.Errorfwith%wfor error context - Concurrency Safety: Proper use of
sync.Mapfor concurrent access - Zero Value Handling:
CleanupOldTxsdefaults toDefaultTxCacheRetentionwhen passed 0 (line 201-203) - Logging: Appropriate log levels and structured logging with context
🔍 Minor Issues
1. Type Assertion Without Second Return Value
Location: block/internal/cache/generic_cache.go:72
func (c *Cache[T]) isSeen(hash string) bool {
seen, ok := c.hashes.Load(hash)
if !ok {
return false
}
return seen.(bool) // ⚠️ Panic risk if not bool
}This type assertion could panic if the value isn't a bool. Use the two-value form:
return seen.(bool) // Current
// Better:
if seenBool, ok := seen.(bool); ok {
return seenBool
}
return falseThough in practice this is safe (only bools are stored), defensive programming is better.
2. Direct Access to Internal Cache Field
Location: block/internal/cache/manager.go:221
m.txCache.hashes.Delete(hash)This directly accesses txCache.hashes, breaking encapsulation. Consider adding a method to Cache[T]:
func (c *Cache[T]) deleteSeen(hash string) {
c.hashes.Delete(hash)
}Then use: m.txCache.deleteSeen(hash)
3. Inconsistent Use of Constants
Location: block/internal/reaping/reaper.go:99
cleanupTicker := time.NewTicker(1 * time.Hour)This magic number should be a constant, similar to DefaultInterval:
const (
DefaultInterval = 1 * time.Second
MaxBackoffInterval = 30 * time.Second
BackoffMultiplier = 2
TxCleanupInterval = 1 * time.Hour // Add this
)Error Handling & Resilience
✅ Strengths
- Graceful Error Handling:
SaveToDisk()andLoadFromDisk()errors are logged as warnings, not failures (lines 135, 140) - Retry Logic: Reaper has exponential backoff on errors (lines 111-122)
- Transaction Rollback: If sequencer submission fails, transactions are NOT marked as seen (line 186-187)
🔍 Potential Issues
No Cleanup on Load Failure
Location: block/internal/cache/manager.go:338-358, LoadFromDisk()
If loading one cache fails (e.g., corrupted data), the error is returned immediately, but partially loaded data remains in memory. Consider:
- Clearing all caches on load failure
- Continuing to load other caches even if one fails
- At minimum, document the behavior
Testing
✅ Excellent Coverage
The test suite is comprehensive and well-structured:
TestManager_TxOperations- Basic functionalityTestManager_CleanupOldTxs- Time-based cleanupTestManager_CleanupOldTxs_SelectiveRemoval- Partial cleanupTestManager_TxCache_PersistAndLoad- Disk persistenceTestManager_DeleteHeight_PreservesTxCache- Height-based deletion behaviorTestReaper_CachePersistence- End-to-end persistence
The tests use proper mocking, table-driven patterns where appropriate, and cover edge cases.
🔍 Testing Gaps
- No test for concurrent access to
IsTxSeen/SetTxSeen(thoughsync.Maphandles this) - No test for cleanup during high transaction volume
- No test for
LoadFromDisk()with corrupted data
Performance Considerations
✅ Good Decisions
sync.MapUsage: Appropriate for read-heavy workloads (checking if tx is seen)- Lazy Cleanup: Hourly cleanup ticker instead of per-transaction checks
- Single Cleanup Loop: Efficient iteration over timestamps with early deletion
🔍 Potential Concerns
1. Double Iteration in Cleanup
Location: block/internal/cache/manager.go:208-226, CleanupOldTxs()
The cleanup iterates txTimestamps, then deletes from both txCache.hashes and txTimestamps. This is two sync.Map operations per transaction. Not a big issue, but could be optimized by batch deletion if performance becomes critical.
2. Memory Growth Between Cleanups
With hourly cleanup and high transaction volume, the cache could grow significantly:
- At 1000 tx/sec: 3.6M entries in 1 hour
- Each entry: ~50-100 bytes (hash string + timestamp + map overhead)
- Memory: ~180-360 MB/hour
The 1-hour cleanup interval seems reasonable, but monitor in production.
3. Disk I/O on Shutdown
If disk persistence is kept, shutdown time increases with cache size:
- Gob encoding 3.6M entries could take several seconds
- Not a problem for graceful shutdown, but could delay restarts
Security
✅ No Major Concerns
- Transaction hashes are SHA-256, so no injection risks
- No sensitive data in logs
- Disk permissions are properly set (0o755 for directories)
🔍 Minor Consideration
Disk Space Exhaustion: If cleanup fails or is disabled, the cache could grow unbounded on disk. Consider:
- Maximum cache size limit
- Monitoring/alerting on cache directory size
- Automatic cleanup on disk usage threshold
Documentation
✅ Good Documentation
- Comments explain non-obvious decisions (e.g., why txCache isn't deleted by height)
- Exported functions have clear doc comments
- Constants are documented (e.g.,
DefaultTxCacheRetention)
🔍 Improvement Suggestions
- Add package-level comment to
cachepackage explaining the architecture - Document cleanup interval and how to tune it
- Explain persistence strategy in
cache.Managerinterface doc
Specific File Reviews
block/internal/cache/manager.go
Lines 190-194: SetTxSeen Implementation
func (m *implementation) SetTxSeen(hash string) {
m.txCache.setSeen(hash, 0)
m.txTimestamps.Store(hash, time.Now())
}✅ Correct use of height=0 for transactions (not yet in a block)
SetTxSeen twice updates the timestamp, extending lifetime. This might be intentional, but document it.
Lines 244-250: DeleteHeight Comment
Excellent! This kind of explanatory comment prevents future bugs.
block/internal/reaping/reaper.go
Lines 95-140: reaperLoop Implementation
✅ Clean implementation with dual tickers (reaping + cleanup)
✅ Exponential backoff on errors
✅ Cleanup logging
Lines 168-172: Transaction Filtering
for _, tx := range txs {
txHash := hashTx(tx)
if !r.cache.IsTxSeen(txHash) {
newTxs = append(newTxs, tx)
}
}✅ Simple and correct. Consider pre-allocating newTxs if len(txs) is large:
newTxs := make([][]byte, 0, len(txs))block/internal/cache/generic_cache.go
Lines 75-79: setSeen with Height Tracking
func (c *Cache[T]) setSeen(hash string, height uint64) {
c.hashes.Store(hash, true)
c.hashByHeight.Store(height, hash) // ⚠️ Potential issue
}hashByHeight maps height -> hash, but multiple hashes can have the same height. This overwrites previous entries. For transactions (height=0), this means only the LAST transaction marked as seen is tracked by height.
Impact: In deleteAllForHeight(), only one hash per height is deleted from hashes. For transactions at height=0, this could leave orphaned entries.
Recommendation: Change hashByHeight to map[height][]hash or accept that height-based deletion doesn't apply to transactions (which is the current documented behavior).
Recommendations
High Priority
- ✅ Fix compilation error in test file (verify imports work correctly)
- 🔧 Decide on persistence strategy: In-memory only vs. disk persistence, and document the rationale
- 📝 Add cleanup on shutdown if keeping persistence
Medium Priority
- 🔍 Review timestamp reset behavior on load (lines 360-368)
- 🛡️ Add defensive type assertions in
isSeen()and similar methods - 📦 Encapsulate cache field access (avoid
m.txCache.hashes.Delete()) - 📊 Add constant for cleanup interval (currently hardcoded as 1 hour)
Low Priority
- 📝 Add package-level documentation to cache package
- 🧪 Add concurrency stress tests
- 🎯 Pre-allocate slice in transaction filtering loop (minor optimization)
Conclusion
This PR represents a solid architectural improvement with good code quality. The main decision point is whether to persist transaction hashes to disk or keep them in-memory only. I recommend making this decision explicit and documenting the rationale.
The test coverage is excellent, and the implementation is mostly sound. The critical issue is the potential compilation error in the test file, which should be verified before merging.
Overall Assessment: ✅ Approve with minor revisions
Once the persistence strategy is decided and documented, and the test compilation is verified, this is ready to merge.
Summary of ChangesHello @julienrbrt, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the transaction reaping mechanism by migrating the storage of "seen" transaction hashes from an isolated in-memory datastore within the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the reaper component to use a centralized cache.Manager for tracking seen transaction hashes, instead of its own in-memory datastore. This is a solid architectural improvement that centralizes caching logic and enables persistence of seen transactions across node restarts. The changes are well-implemented, and the addition of a persistence test for the cache is a great touch.
I've found a minor issue in a test file related to a duplicate package import which will cause a compilation error. I've left comments with suggestions on how to fix it. Overall, this is a great refactoring that improves the codebase.
| "testing" | ||
| "time" | ||
|
|
||
| "github.com/ipfs/go-datastore" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2811 +/- ##
==========================================
+ Coverage 62.37% 62.61% +0.23%
==========================================
Files 82 82
Lines 7304 7334 +30
==========================================
+ Hits 4556 4592 +36
+ Misses 2203 2197 -6
Partials 545 545
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im more leaning towards making this short lived, Writing to disk here seems like extra IO we dont really need to care about
can do. writing it disk does only happen at stopping tho. |
Overview
Use cache instead of in memory store for reaper