-
Notifications
You must be signed in to change notification settings - Fork 2.5k
mempool/txgraph: add new transaction graph data structures and traversal+eviction algoritms #2436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
Roasbeef
wants to merge
7
commits into
btcsuite:master
Choose a base branch
from
Roasbeef:truc-graph
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit establishes the foundational type system and interface contracts for the transaction graph package. All types are defined before they are referenced, ensuring clean compilation order and eliminating circular dependencies. The type system models the core concepts of mempool transaction management. TxDesc provides transaction metadata without depending on the broader mempool package, enabling the graph to be a standalone component. The PackageType enumeration captures the different transaction grouping patterns that require distinct validation logic—1P1C for simple CPFP, TRUC for BIP 431 topology restrictions, ephemeral for dust outputs that must be spent, and standard for general connected groups. TxGraphNode is the central type representing transactions within the graph structure. It uses maps for parent and child relationships to enable O(1) lookups during traversal and conflict detection. The cachedMetrics field stores expensive-to-compute ancestor and descendant counts, avoiding repeated graph walks during policy enforcement. The Metadata section tracks feature-specific flags like TRUC and ephemeral dust status, associating transactions with their packages and clusters. The cluster and package types model higher-level graph structures. TxCluster represents connected components, which are critical for RBF incentive compatibility checks—replacements must improve the fee rate of the entire cluster, not just individual transactions. The cached FeerateDiagram enables efficient comparison without recomputing cumulative fee schedules on every validation. TxPackage captures validated transaction groups with computed topology properties. The PackageTopology fields distinguish linear chains from complex DAGs, which matters for validation—TRUC packages must be trees while ephemeral packages can have arbitrary structure. The cached validation result avoids redundant checks during relay processing. The Graph interface defines the complete contract for graph operations, grouping methods by category—node operations, relationship management, queries, package analysis, iteration, and metrics. Each method includes detailed documentation explaining not just what it does but why it exists and how it fits into mempool policy enforcement. Iterator configuration uses the functional options pattern with types like IteratorOption and TraversalOrder, enabling flexible graph traversal without complex parameter lists. The GraphQuery interface provides advanced operations like cycle detection and feerate distribution analysis for debugging and monitoring. The PackageAnalyzer interface abstracts protocol-specific validation rules, enabling testing with mock implementations and future protocol upgrades without modifying the core graph. It separates concerns between graph structure management (the Graph type) and transaction-specific validation logic (PackageAnalyzer implementations).
This commit provides the concrete implementation of the Graph interface, managing the in-memory transaction graph with thread-safe operations for adding, removing, and querying transactions and their relationships. The TxGraph struct maintains the primary node storage as a hash map for O(1) lookups. When transactions are added via AddTransaction, the implementation automatically creates edges to any parent transactions already present in the graph by inspecting the transaction inputs. This incremental construction pattern enables efficient graph building as transactions arrive from the P2P network. Transaction removal supports two modes to handle different eviction scenarios. RemoveTransaction performs cascade deletion, recursively removing all descendants to maintain graph consistency—this is essential when blocks confirm transactions, since children spending unconfirmed outputs must also be removed. RemoveTransactionNoCascade provides fine-grained control for cases where the caller manages descendants explicitly, avoiding redundant traversals during bulk evictions. The ancestor and descendant query methods implement breadth-first traversal with depth limits, enabling enforcement of BIP 125 ancestor/descendant count and size policies. GetAncestors walks backward through parent edges while GetDescendants walks forward through children, both accumulating results up to the specified maximum depth. These queries are fundamental to mempool policy checks that prevent transactions with excessive dependency chains. Cluster retrieval (GetCluster) identifies the connected component containing a given transaction, which is critical for RBF validation. When evaluating replacement transactions, we must consider the fee impact on the entire cluster rather than individual transactions, preventing attacks where replacements improve one transaction while degrading the overall cluster fee rate. The orphan detection methods (GetOrphans and IterateOrphans) distinguish between true orphans awaiting unconfirmed parents and root transactions spending confirmed UTXOs. The InputConfirmedPredicate parameter enables the caller to provide chain state information, avoiding orphan classification for transactions with all confirmed inputs. This distinction matters for relay policies that handle orphans differently from normal transaction acceptance.
This commit adds comprehensive iteration support using Go 1.23 iter.Seq iterators, enabling lazy evaluation of graph traversals for memory-efficient processing of large transaction sets. The Iterate method serves as the primary entry point, accepting functional options that configure traversal order, direction, depth limits, filters, and starting nodes. This design avoids parameter explosion while maintaining type safety and discoverability—each WithXXX function modifies a specific aspect of the iteration behavior. Multiple traversal strategies are implemented to serve different use cases. Depth-first search (DFS) explores deep into dependency chains before backtracking, useful for finding long ancestor sequences. Breadth-first search (BFS) visits nodes level-by-level, ensuring minimum-depth paths are found first. Topological order guarantees parents appear before children, essential for block template construction where dependencies must be satisfied. Reverse topological order enables bottom-up analysis, processing leaves before roots. The fee rate traversal (TraversalFeeRate) sorts transactions by fee rate from highest to lowest, supporting greedy block template building and fee-based eviction policies. This ordering requires materializing the full result set rather than streaming, but provides the natural ordering for mining and relay decisions. Directional control allows backward traversal (following parent edges), forward traversal (following child edges), or bidirectional traversal (both). This flexibility supports both ancestor queries (backward) and descendant queries (forward) without duplicating logic. The IncludeStart option determines whether the starting node itself appears in results, enabling either "ancestors of X" or "X and its ancestors" semantics. Filter predicates provide fine-grained control over which nodes are yielded, enabling queries like "high-fee transactions" or "TRUC transactions only" without custom iteration logic. Filters integrate seamlessly with other options, applying after traversal order but before yielding to the consumer. The IteratePairs method yields parent-child edges rather than nodes, enabling analysis of transaction relationships and fund flows. This is useful for conflict detection (finding transactions spending the same outputs) and for tracking value movement through the graph. Orphan iteration (IterateOrphans) identifies transactions with no parents in the graph. When combined with the InputConfirmedPredicate, it distinguishes true orphans (waiting for unconfirmed parents not yet in the mempool) from root transactions (spending confirmed UTXOs). This distinction enables targeted orphan handling without scanning the entire graph. All iteration uses Go's range-over-function feature, allowing natural for-range syntax while maintaining lazy evaluation. The iterator can be stopped early by returning from the loop, and the underlying implementation respects this by checking the yield return value. This provides backpressure, avoiding work when the consumer is satisfied with partial results.
This commit implements transaction package detection and topology analysis, enabling package-aware relay policies and mining optimizations for Bitcoin transaction groups. The IdentifyPackages method scans the graph to detect transaction groups that form coherent packages. It prioritizes more specific package types (TRUC, ephemeral) before falling back to general categories (1P1C, standard), ensuring the most restrictive validation rules are applied. This ordering matters because a transaction group might technically match multiple patterns—for example, a TRUC package is also a 1P1C package, but TRUC validation is more stringent and should take precedence. Each package type has a dedicated try method implementing type-specific detection logic. try1P1CPackage identifies the simple one-parent-one-child pattern used for basic CPFP (Child Pays For Parent). tryTRUCPackage detects BIP 431 version 3 transactions with topology restrictions designed to prevent transaction pinning attacks. tryEphemeralPackage finds groups containing dust outputs that must be spent within the same package. tryStandardPackage catches any other connected transaction groups. Package root detection uses a conservative heuristic—transactions with no unconfirmed parents in the graph are potential roots. This works because package relay typically starts with the lowest-fee transaction (the parent) and adds higher-fee children. The root identification could be refined with additional chain state, but the current approach handles the common case of packages built incrementally as transactions arrive. Topology computation (calculateTopology) analyzes package structure to extract metrics like maximum depth, maximum width, and tree properties. These metrics determine which validation rules apply—TRUC packages must be trees (no diamond patterns) and have limited depth, while ephemeral packages can have more complex structures as long as all dust is spent. The IsLinear flag enables optimizations for simple chains, the most common package structure. CreatePackage allows explicit package construction from a specified set of transactions, useful when the caller knows which transactions form a package (for example, during package relay message handling). This complements IdentifyPackages which performs automatic detection through graph scanning. Package validation integrates with the PackageAnalyzer interface, delegating type-specific rules to the analyzer implementation. This separation enables testing with mock analyzers and supports future protocol upgrades without modifying the core package detection logic. The graph code handles structural analysis (topology, connectivity) while the analyzer handles protocol rules (TRUC topology restrictions, ephemeral dust requirements). The implementation marks processed transactions to avoid redundant package detection, improving efficiency when scanning large graphs. This tracking is local to each IdentifyPackages call, not persistent state, so the graph can be queried multiple times with different results as transactions are added or removed. Package IDs incorporate both the root transaction hash and package type, enabling the same root to potentially belong to multiple package interpretations. In practice this is rare, but the design accommodates it rather than forcing arbitrary choices about package classification.
This commit provides thorough test coverage for the transaction graph package, achieving 92.3% statement coverage through a combination of unit tests, integration tests, and benchmark tests. The test suite is organized into focused files by functionality. graph_test.go covers core graph operations like adding/removing transactions, edge management, and ancestor/descendant queries. iterator_test.go exercises the various traversal strategies and iteration options. package_test.go validates package identification and topology analysis. orphans_test.go verifies orphan detection with and without confirmation predicates. traversal_test.go tests advanced iteration features like filters, direction control, and depth limits. The tests follow a consistent pattern of creating small transaction graphs with known structure, performing operations, and asserting expected results. Helper functions like createTestTx and newTxGenerator reduce boilerplate while keeping tests readable. The generator approach enables creating realistic transaction chains and DAGs without manual construction. Mock implementations (mock_analyzer_test.go) provide test doubles for the PackageAnalyzer interface, enabling graph testing without depending on protocol-specific validation logic. This separation allows graph behavior to be verified independently of TRUC rules, ephemeral dust policies, and other validation concerns. The orphan tests demonstrate the critical distinction between transactions awaiting unconfirmed parents (true orphans) and transactions spending confirmed UTXOs (root transactions). This distinction drives different relay policies— orphans may need special handling while roots can be processed normally. Traversal tests exercise all iteration options including IncludeStart, Filter, Direction, and MaxDepth. These options combine in various ways, and the tests verify correct behavior for combinations like "backward DFS with filter" or "forward BFS with depth limit." The tests use slices.Collect to materialize iterator results, enabling straightforward assertion on the full result set. Iterator tests verify early termination by breaking from range loops, ensuring the graph respects backpressure and stops traversal when the consumer is satisfied. This is important for performance—callers should be able to find the first matching transaction without scanning the entire graph. Package tests validate detection of 1P1C, TRUC, ephemeral, and standard packages. They verify topology computation (depth, width, linear, tree flags) and package validation integration with the PackageAnalyzer interface. The tests cover edge cases like disconnected transaction sets, cycles (which should never occur but must be handled), and ambiguous package classifications. Benchmark tests (bench_test.go) measure performance for critical operations like transaction addition, ancestor queries, and package identification. These benchmarks establish performance baselines and detect regressions as the code evolves. They use realistic graph sizes and transaction counts to represent actual mempool conditions.
This commit provides two levels of documentation for the transaction graph package—godoc comments for API reference and a comprehensive README for user onboarding. The doc.go file establishes package-level documentation that appears in godoc output, providing a high-level introduction to the package's purpose and capabilities. It outlines the core features (efficient lookups, edge creation, cluster management, package identification, orphan detection, traversal strategies) and describes the major concepts (graph structure, packages, iteration, orphan detection, thread safety). The documentation includes practical code examples showing common usage patterns. Rather than exhaustive API coverage, these examples demonstrate the typical workflow—creating a graph, adding transactions, querying ancestors and descendants, identifying packages, and iterating with custom filters. This example-driven approach helps developers understand not just what methods exist but how they fit together in real applications. The README.md takes a different approach, targeting developers who are evaluating or integrating the package. It starts with motivation—explaining why transaction graphs matter for mempool management and what problems they solve. This context helps readers understand when they need this package and what benefits it provides. The core concepts section defines terms like "transaction graph," "cluster," and "package" with emphasis on their practical implications. For example, clusters are explained not just as connected components but as the unit of analysis for RBF validation—replacements must improve the entire cluster's fee rate, not individual transactions. This connects abstract concepts to concrete use cases. The quick start examples are runnable code demonstrating key workflows. The first example shows basic graph construction with automatic edge creation. The second demonstrates cluster iteration for fee analysis. The third covers package identification and validation. The fourth shows advanced iteration with functional options and filters. These examples progress from simple to complex, building understanding incrementally. The common patterns section captures best practices discovered during implementation. It covers incremental graph building, package-aware relay, ancestor/descendant limit enforcement, and efficient cleanup on block confirmation. These patterns represent battle-tested approaches to recurring problems, saving readers from rediscovering solutions. The PackageAnalyzer interface receives special attention because it represents the main extension point. The documentation explains the abstraction purpose (separating graph structure from protocol validation) and lists the methods each implementation must provide. This section helps developers who need to add new package types or customize validation logic. Performance characteristics provide guidance on the computational cost of operations, helping developers make informed architectural decisions. The O(1), O(n), and O(d) complexity annotations set expectations for different graph sizes and query patterns. Mentioning the ~1KB memory overhead per transaction helps with capacity planning for mempool implementations.
Pull Request Test Coverage Report for Build 18118100572Details
💛 - Coveralls |
kmk142789
approved these changes
Nov 5, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In this PR, we introduce a new internal
txgraphpackage. The goal of this package is to implement relatively performant transaction graph routines (traversal, eviction, package tracking, etc) to eventually replace the simple series of flat maps we have in themempoolpackage today.We don't yet integrate this directly into the main
mempoollogic. That'll be done in another PR eventually. The current API also isn't to be considered finalized by any means, as the PRs that will use this package to implement eph dust and TRUc aren't finished yet. We may end up deleting many of the methods here as they aren't used, but I wan't to provide a concrete base to build off of.btcdisn't used for mining much these days, so we aren't really concerned with identifying or generating "optimal" packages w.r.t fee rates. We just want to be able to create, track, and relay packages.Most of the diff is tests, as I was shooting for 90%+ test coverage. The property based tests in particular were useful, as I was able to identify and fix several bugs due to them.
This PR is related, but not dependent on: #2432 and #2433.
The next PR will use these 3 releated PRs to progressively implement: TRUC, eph dust, and 1p1c.