Skip to content

Conversation

@julienrbrt
Copy link
Member

@julienrbrt julienrbrt commented Nov 4, 2025

Overview

Use cache instead of in memory store for reaper

@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2025

The latest Buf updates on your PR. Results from workflow CI and Release / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedNov 4, 2025, 3:15 PM

@claude
Copy link
Contributor

claude bot commented Nov 4, 2025

Claude finished @julienrbrt's task —— View job


Code Review Complete

I've completed a comprehensive review of this PR. Overall, this is a solid architectural improvement that centralizes transaction hash tracking and enables persistence. The code quality is good, with excellent test coverage for the new functionality. However, I've identified several areas for improvement and one critical issue that needs attention.


Summary

What Changed:

  • Migrated transaction "seen" hash tracking from isolated in-memory datastore in Reaper to centralized cache.Manager
  • Added IsTxSeen, SetTxSeen, and CleanupOldTxs methods to cache.Manager
  • Implemented disk persistence for transaction cache with hourly cleanup mechanism
  • Renamed test utility: NewDefaultInMemoryKVStoreNewTestInMemoryKVStore

Impact:

  • ✅ Better architecture: Centralized caching layer
  • ✅ Persistence: Transaction seen status survives restarts
  • ⚠️ I/O overhead: New disk writes on shutdown (see discussion below)

Critical Issues

1. ⚠️ Compilation Error in Test File

Location: block/internal/reaping/reaper_test.go:9-10

There's a duplicate import causing a compilation error (also caught by Gemini's review):

import (
    ds "github.com/ipfs/go-datastore"
    dssync "github.com/ipfs/go-datastore/sync"
    // ... other imports
)

But then on lines 228 and 253, the code uses:

dataStore := dssync.MutexWrap(ds.NewMapDatastore())  // line 228
dataStore2 := dssync.MutexWrap(ds.NewMapDatastore()) // line 253

This should work correctly as-is with the aliased import. Please verify the test file compiles.


Architecture & Design

✅ Strengths

  1. Clean Separation of Concerns: Moving tx hash tracking to cache.Manager is the right architectural decision. It consolidates all caching logic in one place.

  2. Generic Cache Reuse: Smart use of Cache[struct{}] for transaction hashes. This avoids code duplication and leverages existing infrastructure.

  3. Explicit Design Decisions: Excellent documentation explaining why txCache is NOT cleared in DeleteHeight() (lines 244-250 in manager.go). This kind of reasoning documentation prevents future bugs.

  4. Time-Based Cleanup: Using time-based expiration (24h default) instead of height-based cleanup makes sense for transaction deduplication.

🔍 Areas for Improvement

1. Architectural Question: Disk Persistence Trade-off

Location: block/internal/cache/manager.go:323-325, SaveToDisk() method

@tac0turtle raised a valid concern about disk I/O. Let me add context:

Current Behavior:

  • Transaction hashes are persisted to disk on shutdown (via SaveToDisk())
  • On restart, they're loaded and timestamps reset to time.Now() (line 362)
  • This prevents re-submission of transactions seen before restart

Trade-offs:

Approach Pros Cons
Persist to disk (current) Prevents duplicate tx submissions across restarts; consistent with other cache behavior Extra I/O on shutdown; minimal benefit if restarts are rare
In-memory only No I/O overhead; simpler Transactions seen before restart may be resubmitted (though sequencer should handle this)

Recommendation: I lean toward @tac0turtle's suggestion for in-memory only, but with caveats:

  1. The sequencer should already handle duplicate transactions (idempotency)
  2. The 24-hour window is long enough that restart-related duplicates are unlikely to cause issues
  3. Consistency argument: If this is truly ephemeral data, why persist it?

However, if you keep persistence:

  • The current implementation is correct
  • The I/O happens only on graceful shutdown (not hot path)
  • Consistency with other caches (header, data) is maintained

Suggested Action: Decide explicitly and document the rationale. If going in-memory only:

  • Remove txCache from SaveToDisk() and LoadFromDisk()
  • Update the comment at line 331-333 to explain why

2. Missing Cleanup on Shutdown

Location: block/internal/reaping/reaper.go:143-150, Stop() method

The reaper's Stop() method doesn't save the cache to disk. If persistence is desired, you should:

func (r *Reaper) Stop() error {
    if r.cancel != nil {
        r.cancel()
    }
    r.wg.Wait()

    // Save cache to disk on graceful shutdown
    if err := r.cache.SaveToDisk(); err != nil {
        r.logger.Warn().Err(err).Msg("failed to save cache on shutdown")
    }

    r.logger.Info().Msg("reaper stopped")
    return nil
}

Note: This is only relevant if keeping disk persistence.


3. Timestamp Inconsistency on Load

Location: block/internal/cache/manager.go:360-368, LoadFromDisk() method

When loading from disk, ALL transaction timestamps are reset to time.Now():

now := time.Now()
m.txCache.hashes.Range(func(key, value any) bool {
    if hash, ok := key.(string); ok {
        m.txTimestamps.Store(hash, now)
    }
    return true
})

Issue: This defeats the purpose of time-based cleanup. A transaction that was 23 hours old at shutdown will be treated as "new" on restart, extending its lifetime to 47 hours total.

Options:

  1. Persist timestamps (complex, requires gob registration of time.Time, more I/O)
  2. Use conservative timestamp on load (e.g., time.Now().Add(-23 * time.Hour) to ensure old txs are cleaned up soon)
  3. Accept the behavior as reasonable (document it)
  4. Go in-memory only (removes the problem entirely)

Recommendation: If keeping persistence, document this behavior explicitly. If it's a problem, persist timestamps or go in-memory only.


Code Quality & Go Idioms

✅ Good Practices

  1. Error Wrapping: Consistent use of fmt.Errorf with %w for error context
  2. Concurrency Safety: Proper use of sync.Map for concurrent access
  3. Zero Value Handling: CleanupOldTxs defaults to DefaultTxCacheRetention when passed 0 (line 201-203)
  4. Logging: Appropriate log levels and structured logging with context

🔍 Minor Issues

1. Type Assertion Without Second Return Value

Location: block/internal/cache/generic_cache.go:72

func (c *Cache[T]) isSeen(hash string) bool {
    seen, ok := c.hashes.Load(hash)
    if !ok {
        return false
    }
    return seen.(bool)  // ⚠️ Panic risk if not bool
}

This type assertion could panic if the value isn't a bool. Use the two-value form:

return seen.(bool)  // Current
// Better:
if seenBool, ok := seen.(bool); ok {
    return seenBool
}
return false

Though in practice this is safe (only bools are stored), defensive programming is better.


2. Direct Access to Internal Cache Field

Location: block/internal/cache/manager.go:221

m.txCache.hashes.Delete(hash)

This directly accesses txCache.hashes, breaking encapsulation. Consider adding a method to Cache[T]:

func (c *Cache[T]) deleteSeen(hash string) {
    c.hashes.Delete(hash)
}

Then use: m.txCache.deleteSeen(hash)


3. Inconsistent Use of Constants

Location: block/internal/reaping/reaper.go:99

cleanupTicker := time.NewTicker(1 * time.Hour)

This magic number should be a constant, similar to DefaultInterval:

const (
    DefaultInterval = 1 * time.Second
    MaxBackoffInterval = 30 * time.Second
    BackoffMultiplier = 2
    TxCleanupInterval = 1 * time.Hour  // Add this
)

Error Handling & Resilience

✅ Strengths

  1. Graceful Error Handling: SaveToDisk() and LoadFromDisk() errors are logged as warnings, not failures (lines 135, 140)
  2. Retry Logic: Reaper has exponential backoff on errors (lines 111-122)
  3. Transaction Rollback: If sequencer submission fails, transactions are NOT marked as seen (line 186-187)

🔍 Potential Issues

No Cleanup on Load Failure

Location: block/internal/cache/manager.go:338-358, LoadFromDisk()

If loading one cache fails (e.g., corrupted data), the error is returned immediately, but partially loaded data remains in memory. Consider:

  1. Clearing all caches on load failure
  2. Continuing to load other caches even if one fails
  3. At minimum, document the behavior

Testing

✅ Excellent Coverage

The test suite is comprehensive and well-structured:

  1. TestManager_TxOperations - Basic functionality
  2. TestManager_CleanupOldTxs - Time-based cleanup
  3. TestManager_CleanupOldTxs_SelectiveRemoval - Partial cleanup
  4. TestManager_TxCache_PersistAndLoad - Disk persistence
  5. TestManager_DeleteHeight_PreservesTxCache - Height-based deletion behavior
  6. TestReaper_CachePersistence - End-to-end persistence

The tests use proper mocking, table-driven patterns where appropriate, and cover edge cases.

🔍 Testing Gaps

  1. No test for concurrent access to IsTxSeen/SetTxSeen (though sync.Map handles this)
  2. No test for cleanup during high transaction volume
  3. No test for LoadFromDisk() with corrupted data

Performance Considerations

✅ Good Decisions

  1. sync.Map Usage: Appropriate for read-heavy workloads (checking if tx is seen)
  2. Lazy Cleanup: Hourly cleanup ticker instead of per-transaction checks
  3. Single Cleanup Loop: Efficient iteration over timestamps with early deletion

🔍 Potential Concerns

1. Double Iteration in Cleanup

Location: block/internal/cache/manager.go:208-226, CleanupOldTxs()

The cleanup iterates txTimestamps, then deletes from both txCache.hashes and txTimestamps. This is two sync.Map operations per transaction. Not a big issue, but could be optimized by batch deletion if performance becomes critical.

2. Memory Growth Between Cleanups

With hourly cleanup and high transaction volume, the cache could grow significantly:

  • At 1000 tx/sec: 3.6M entries in 1 hour
  • Each entry: ~50-100 bytes (hash string + timestamp + map overhead)
  • Memory: ~180-360 MB/hour

The 1-hour cleanup interval seems reasonable, but monitor in production.

3. Disk I/O on Shutdown

If disk persistence is kept, shutdown time increases with cache size:

  • Gob encoding 3.6M entries could take several seconds
  • Not a problem for graceful shutdown, but could delay restarts

Security

✅ No Major Concerns

  1. Transaction hashes are SHA-256, so no injection risks
  2. No sensitive data in logs
  3. Disk permissions are properly set (0o755 for directories)

🔍 Minor Consideration

Disk Space Exhaustion: If cleanup fails or is disabled, the cache could grow unbounded on disk. Consider:

  1. Maximum cache size limit
  2. Monitoring/alerting on cache directory size
  3. Automatic cleanup on disk usage threshold

Documentation

✅ Good Documentation

  1. Comments explain non-obvious decisions (e.g., why txCache isn't deleted by height)
  2. Exported functions have clear doc comments
  3. Constants are documented (e.g., DefaultTxCacheRetention)

🔍 Improvement Suggestions

  1. Add package-level comment to cache package explaining the architecture
  2. Document cleanup interval and how to tune it
  3. Explain persistence strategy in cache.Manager interface doc

Specific File Reviews

block/internal/cache/manager.go

Lines 190-194: SetTxSeen Implementation

func (m *implementation) SetTxSeen(hash string) {
    m.txCache.setSeen(hash, 0)
    m.txTimestamps.Store(hash, time.Now())
}

✅ Correct use of height=0 for transactions (not yet in a block)
⚠️ Consider: Should this be idempotent? Currently, calling SetTxSeen twice updates the timestamp, extending lifetime. This might be intentional, but document it.

Lines 244-250: DeleteHeight Comment

Excellent! This kind of explanatory comment prevents future bugs.

block/internal/reaping/reaper.go

Lines 95-140: reaperLoop Implementation

✅ Clean implementation with dual tickers (reaping + cleanup)
✅ Exponential backoff on errors
✅ Cleanup logging

Lines 168-172: Transaction Filtering

for _, tx := range txs {
    txHash := hashTx(tx)
    if !r.cache.IsTxSeen(txHash) {
        newTxs = append(newTxs, tx)
    }
}

✅ Simple and correct. Consider pre-allocating newTxs if len(txs) is large:

newTxs := make([][]byte, 0, len(txs))

block/internal/cache/generic_cache.go

Lines 75-79: setSeen with Height Tracking

func (c *Cache[T]) setSeen(hash string, height uint64) {
    c.hashes.Store(hash, true)
    c.hashByHeight.Store(height, hash)  // ⚠️ Potential issue
}

⚠️ Issue: hashByHeight maps height -> hash, but multiple hashes can have the same height. This overwrites previous entries. For transactions (height=0), this means only the LAST transaction marked as seen is tracked by height.

Impact: In deleteAllForHeight(), only one hash per height is deleted from hashes. For transactions at height=0, this could leave orphaned entries.

Recommendation: Change hashByHeight to map[height][]hash or accept that height-based deletion doesn't apply to transactions (which is the current documented behavior).


Recommendations

High Priority

  1. Fix compilation error in test file (verify imports work correctly)
  2. 🔧 Decide on persistence strategy: In-memory only vs. disk persistence, and document the rationale
  3. 📝 Add cleanup on shutdown if keeping persistence

Medium Priority

  1. 🔍 Review timestamp reset behavior on load (lines 360-368)
  2. 🛡️ Add defensive type assertions in isSeen() and similar methods
  3. 📦 Encapsulate cache field access (avoid m.txCache.hashes.Delete())
  4. 📊 Add constant for cleanup interval (currently hardcoded as 1 hour)

Low Priority

  1. 📝 Add package-level documentation to cache package
  2. 🧪 Add concurrency stress tests
  3. 🎯 Pre-allocate slice in transaction filtering loop (minor optimization)

Conclusion

This PR represents a solid architectural improvement with good code quality. The main decision point is whether to persist transaction hashes to disk or keep them in-memory only. I recommend making this decision explicit and documenting the rationale.

The test coverage is excellent, and the implementation is mostly sound. The critical issue is the potential compilation error in the test file, which should be verified before merging.

Overall Assessment:Approve with minor revisions

Once the persistence strategy is decided and documented, and the test compilation is verified, this is ready to merge.


@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @julienrbrt, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the transaction reaping mechanism by migrating the storage of "seen" transaction hashes from an isolated in-memory datastore within the Reaper component to a centralized and persistent cache.Manager. This change streamlines the transaction tracking process, enhances data consistency across components, and improves the overall architecture by leveraging a unified caching layer for various data types. The refactoring also includes updates to the cache manager to support transaction-specific operations and ensures proper persistence of this state.

Highlights

  • Centralized Transaction Cache: The Reaper component now utilizes a shared cache.Manager to track seen transaction hashes, replacing its previous internal in-memory datastore.
  • Cache Manager Enhancements: The cache.Manager interface and its implementation have been extended with new methods (IsTxSeen, SetTxSeen) and a dedicated txCache to manage transaction seen status, including persistence to disk.
  • Refactored Reaper Logic: The Reaper's SubmitTxs method has been updated to interact with the new cache.Manager for efficient checking and marking of seen transactions, simplifying its internal logic.
  • Test Utility Renaming and Simplification: The NewDefaultInMemoryKVStore function has been renamed to NewTestInMemoryKVStore and simplified for testing purposes, with all relevant test files and documentation updated to use this new naming convention.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2025

PR Preview Action v1.6.2
Preview removed because the pull request was closed.
2025-11-04 15:37 UTC

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the reaper component to use a centralized cache.Manager for tracking seen transaction hashes, instead of its own in-memory datastore. This is a solid architectural improvement that centralizes caching logic and enables persistence of seen transactions across node restarts. The changes are well-implemented, and the addition of a persistence test for the cache is a great touch.

I've found a minor issue in a test file related to a duplicate package import which will cause a compilation error. I've left comments with suggestions on how to fix it. Overall, this is a great refactoring that improves the codebase.

"testing"
"time"

"github.com/ipfs/go-datastore"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This import duplicates the aliased import ds for the same package on the next line. This will cause a compilation error. Please remove this line and use the ds alias consistently throughout the file.

@codecov
Copy link

codecov bot commented Nov 4, 2025

Codecov Report

❌ Patch coverage is 71.42857% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.61%. Comparing base (271f74b) to head (a7e42e4).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
block/internal/cache/manager.go 80.00% 4 Missing and 4 partials ⚠️
block/internal/reaping/reaper.go 27.27% 7 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2811      +/-   ##
==========================================
+ Coverage   62.37%   62.61%   +0.23%     
==========================================
  Files          82       82              
  Lines        7304     7334      +30     
==========================================
+ Hits         4556     4592      +36     
+ Misses       2203     2197       -6     
  Partials      545      545              
Flag Coverage Δ
combined 62.61% <71.42%> (+0.23%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im more leaning towards making this short lived, Writing to disk here seems like extra IO we dont really need to care about

@julienrbrt
Copy link
Member Author

im more leaning towards making this short lived, Writing to disk here seems like extra IO we dont really need to care about

can do. writing it disk does only happen at stopping tho.

tac0turtle
tac0turtle previously approved these changes Nov 4, 2025
@julienrbrt julienrbrt added this pull request to the merge queue Nov 4, 2025
Merged via the queue into main with commit 3d98502 Nov 4, 2025
30 of 32 checks passed
@julienrbrt julienrbrt deleted the julien/use-cache-seenstore branch November 4, 2025 15:36
@github-project-automation github-project-automation bot moved this to Done in Evolve Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants