Skip to content

Conversation

@Roasbeef
Copy link
Member

In this PR, we introduce a new internal txgraph package. The goal of this package is to implement relatively performant transaction graph routines (traversal, eviction, package tracking, etc) to eventually replace the simple series of flat maps we have in the mempool package today.

We don't yet integrate this directly into the main mempool logic. That'll be done in another PR eventually. The current API also isn't to be considered finalized by any means, as the PRs that will use this package to implement eph dust and TRUc aren't finished yet. We may end up deleting many of the methods here as they aren't used, but I wan't to provide a concrete base to build off of.

btcd isn't used for mining much these days, so we aren't really concerned with identifying or generating "optimal" packages w.r.t fee rates. We just want to be able to create, track, and relay packages.

Most of the diff is tests, as I was shooting for 90%+ test coverage. The property based tests in particular were useful, as I was able to identify and fix several bugs due to them.

This PR is related, but not dependent on: #2432 and #2433.

The next PR will use these 3 releated PRs to progressively implement: TRUC, eph dust, and 1p1c.

This commit establishes the foundational type system and interface contracts
for the transaction graph package. All types are defined before they are
referenced, ensuring clean compilation order and eliminating circular
dependencies.

The type system models the core concepts of mempool transaction management.
TxDesc provides transaction metadata without depending on the broader mempool
package, enabling the graph to be a standalone component. The PackageType
enumeration captures the different transaction grouping patterns that require
distinct validation logic—1P1C for simple CPFP, TRUC for BIP 431 topology
restrictions, ephemeral for dust outputs that must be spent, and standard for
general connected groups.

TxGraphNode is the central type representing transactions within the graph
structure. It uses maps for parent and child relationships to enable O(1)
lookups during traversal and conflict detection. The cachedMetrics field
stores expensive-to-compute ancestor and descendant counts, avoiding repeated
graph walks during policy enforcement. The Metadata section tracks
feature-specific flags like TRUC and ephemeral dust status, associating
transactions with their packages and clusters.

The cluster and package types model higher-level graph structures. TxCluster
represents connected components, which are critical for RBF incentive
compatibility checks—replacements must improve the fee rate of the entire
cluster, not just individual transactions. The cached FeerateDiagram enables
efficient comparison without recomputing cumulative fee schedules on every
validation.

TxPackage captures validated transaction groups with computed topology
properties. The PackageTopology fields distinguish linear chains from complex
DAGs, which matters for validation—TRUC packages must be trees while ephemeral
packages can have arbitrary structure. The cached validation result avoids
redundant checks during relay processing.

The Graph interface defines the complete contract for graph operations,
grouping methods by category—node operations, relationship management, queries,
package analysis, iteration, and metrics. Each method includes detailed
documentation explaining not just what it does but why it exists and how it
fits into mempool policy enforcement.

Iterator configuration uses the functional options pattern with types like
IteratorOption and TraversalOrder, enabling flexible graph traversal without
complex parameter lists. The GraphQuery interface provides advanced operations
like cycle detection and feerate distribution analysis for debugging and
monitoring.

The PackageAnalyzer interface abstracts protocol-specific validation rules,
enabling testing with mock implementations and future protocol upgrades without
modifying the core graph. It separates concerns between graph structure
management (the Graph type) and transaction-specific validation logic
(PackageAnalyzer implementations).
This commit provides the concrete implementation of the Graph interface,
managing the in-memory transaction graph with thread-safe operations for
adding, removing, and querying transactions and their relationships.

The TxGraph struct maintains the primary node storage as a hash map for O(1)
lookups. When transactions are added via AddTransaction, the implementation
automatically creates edges to any parent transactions already present in the
graph by inspecting the transaction inputs. This incremental construction
pattern enables efficient graph building as transactions arrive from the P2P
network.

Transaction removal supports two modes to handle different eviction scenarios.
RemoveTransaction performs cascade deletion, recursively removing all
descendants to maintain graph consistency—this is essential when blocks confirm
transactions, since children spending unconfirmed outputs must also be removed.
RemoveTransactionNoCascade provides fine-grained control for cases where the
caller manages descendants explicitly, avoiding redundant traversals during
bulk evictions.

The ancestor and descendant query methods implement breadth-first traversal
with depth limits, enabling enforcement of BIP 125 ancestor/descendant count
and size policies. GetAncestors walks backward through parent edges while
GetDescendants walks forward through children, both accumulating results up to
the specified maximum depth. These queries are fundamental to mempool policy
checks that prevent transactions with excessive dependency chains.

Cluster retrieval (GetCluster) identifies the connected component containing a
given transaction, which is critical for RBF validation. When evaluating
replacement transactions, we must consider the fee impact on the entire cluster
rather than individual transactions, preventing attacks where replacements
improve one transaction while degrading the overall cluster fee rate.

The orphan detection methods (GetOrphans and IterateOrphans) distinguish
between true orphans awaiting unconfirmed parents and root transactions
spending confirmed UTXOs. The InputConfirmedPredicate parameter enables the
caller to provide chain state information, avoiding orphan classification for
transactions with all confirmed inputs. This distinction matters for relay
policies that handle orphans differently from normal transaction acceptance.
This commit adds comprehensive iteration support using Go 1.23 iter.Seq
iterators, enabling lazy evaluation of graph traversals for memory-efficient
processing of large transaction sets.

The Iterate method serves as the primary entry point, accepting functional
options that configure traversal order, direction, depth limits, filters, and
starting nodes. This design avoids parameter explosion while maintaining type
safety and discoverability—each WithXXX function modifies a specific aspect of
the iteration behavior.

Multiple traversal strategies are implemented to serve different use cases.
Depth-first search (DFS) explores deep into dependency chains before
backtracking, useful for finding long ancestor sequences. Breadth-first search
(BFS) visits nodes level-by-level, ensuring minimum-depth paths are found first.
Topological order guarantees parents appear before children, essential for
block template construction where dependencies must be satisfied. Reverse
topological order enables bottom-up analysis, processing leaves before roots.

The fee rate traversal (TraversalFeeRate) sorts transactions by fee rate from
highest to lowest, supporting greedy block template building and fee-based
eviction policies. This ordering requires materializing the full result set
rather than streaming, but provides the natural ordering for mining and relay
decisions.

Directional control allows backward traversal (following parent edges),
forward traversal (following child edges), or bidirectional traversal (both).
This flexibility supports both ancestor queries (backward) and descendant
queries (forward) without duplicating logic. The IncludeStart option determines
whether the starting node itself appears in results, enabling either
"ancestors of X" or "X and its ancestors" semantics.

Filter predicates provide fine-grained control over which nodes are yielded,
enabling queries like "high-fee transactions" or "TRUC transactions only"
without custom iteration logic. Filters integrate seamlessly with other options,
applying after traversal order but before yielding to the consumer.

The IteratePairs method yields parent-child edges rather than nodes, enabling
analysis of transaction relationships and fund flows. This is useful for
conflict detection (finding transactions spending the same outputs) and for
tracking value movement through the graph.

Orphan iteration (IterateOrphans) identifies transactions with no parents in
the graph. When combined with the InputConfirmedPredicate, it distinguishes
true orphans (waiting for unconfirmed parents not yet in the mempool) from root
transactions (spending confirmed UTXOs). This distinction enables targeted
orphan handling without scanning the entire graph.

All iteration uses Go's range-over-function feature, allowing natural for-range
syntax while maintaining lazy evaluation. The iterator can be stopped early by
returning from the loop, and the underlying implementation respects this by
checking the yield return value. This provides backpressure, avoiding work when
the consumer is satisfied with partial results.
This commit implements transaction package detection and topology analysis,
enabling package-aware relay policies and mining optimizations for Bitcoin
transaction groups.

The IdentifyPackages method scans the graph to detect transaction groups that
form coherent packages. It prioritizes more specific package types (TRUC,
ephemeral) before falling back to general categories (1P1C, standard), ensuring
the most restrictive validation rules are applied. This ordering matters
because a transaction group might technically match multiple patterns—for
example, a TRUC package is also a 1P1C package, but TRUC validation is more
stringent and should take precedence.

Each package type has a dedicated try method implementing type-specific
detection logic. try1P1CPackage identifies the simple one-parent-one-child
pattern used for basic CPFP (Child Pays For Parent). tryTRUCPackage detects BIP
431 version 3 transactions with topology restrictions designed to prevent
transaction pinning attacks. tryEphemeralPackage finds groups containing dust
outputs that must be spent within the same package. tryStandardPackage catches
any other connected transaction groups.

Package root detection uses a conservative heuristic—transactions with no
unconfirmed parents in the graph are potential roots. This works because
package relay typically starts with the lowest-fee transaction (the parent) and
adds higher-fee children. The root identification could be refined with
additional chain state, but the current approach handles the common case of
packages built incrementally as transactions arrive.

Topology computation (calculateTopology) analyzes package structure to extract
metrics like maximum depth, maximum width, and tree properties. These metrics
determine which validation rules apply—TRUC packages must be trees (no
diamond patterns) and have limited depth, while ephemeral packages can have
more complex structures as long as all dust is spent. The IsLinear flag
enables optimizations for simple chains, the most common package structure.

CreatePackage allows explicit package construction from a specified set of
transactions, useful when the caller knows which transactions form a package
(for example, during package relay message handling). This complements
IdentifyPackages which performs automatic detection through graph scanning.

Package validation integrates with the PackageAnalyzer interface, delegating
type-specific rules to the analyzer implementation. This separation enables
testing with mock analyzers and supports future protocol upgrades without
modifying the core package detection logic. The graph code handles structural
analysis (topology, connectivity) while the analyzer handles protocol rules
(TRUC topology restrictions, ephemeral dust requirements).

The implementation marks processed transactions to avoid redundant package
detection, improving efficiency when scanning large graphs. This tracking is
local to each IdentifyPackages call, not persistent state, so the graph can be
queried multiple times with different results as transactions are added or
removed.

Package IDs incorporate both the root transaction hash and package type,
enabling the same root to potentially belong to multiple package
interpretations. In practice this is rare, but the design accommodates it
rather than forcing arbitrary choices about package classification.
This commit provides thorough test coverage for the transaction graph package,
achieving 92.3% statement coverage through a combination of unit tests,
integration tests, and benchmark tests.

The test suite is organized into focused files by functionality. graph_test.go
covers core graph operations like adding/removing transactions, edge
management, and ancestor/descendant queries. iterator_test.go exercises the
various traversal strategies and iteration options. package_test.go validates
package identification and topology analysis. orphans_test.go verifies orphan
detection with and without confirmation predicates. traversal_test.go tests
advanced iteration features like filters, direction control, and depth limits.

The tests follow a consistent pattern of creating small transaction graphs with
known structure, performing operations, and asserting expected results. Helper
functions like createTestTx and newTxGenerator reduce boilerplate while keeping
tests readable. The generator approach enables creating realistic transaction
chains and DAGs without manual construction.

Mock implementations (mock_analyzer_test.go) provide test doubles for the
PackageAnalyzer interface, enabling graph testing without depending on
protocol-specific validation logic. This separation allows graph behavior to be
verified independently of TRUC rules, ephemeral dust policies, and other
validation concerns.

The orphan tests demonstrate the critical distinction between transactions
awaiting unconfirmed parents (true orphans) and transactions spending confirmed
UTXOs (root transactions). This distinction drives different relay policies—
orphans may need special handling while roots can be processed normally.

Traversal tests exercise all iteration options including IncludeStart, Filter,
Direction, and MaxDepth. These options combine in various ways, and the tests
verify correct behavior for combinations like "backward DFS with filter" or
"forward BFS with depth limit." The tests use slices.Collect to materialize
iterator results, enabling straightforward assertion on the full result set.

Iterator tests verify early termination by breaking from range loops, ensuring
the graph respects backpressure and stops traversal when the consumer is
satisfied. This is important for performance—callers should be able to find the
first matching transaction without scanning the entire graph.

Package tests validate detection of 1P1C, TRUC, ephemeral, and standard
packages. They verify topology computation (depth, width, linear, tree flags)
and package validation integration with the PackageAnalyzer interface. The
tests cover edge cases like disconnected transaction sets, cycles (which should
never occur but must be handled), and ambiguous package classifications.

Benchmark tests (bench_test.go) measure performance for critical operations
like transaction addition, ancestor queries, and package identification. These
benchmarks establish performance baselines and detect regressions as the code
evolves. They use realistic graph sizes and transaction counts to represent
actual mempool conditions.
This commit provides two levels of documentation for the transaction graph
package—godoc comments for API reference and a comprehensive README for user
onboarding.

The doc.go file establishes package-level documentation that appears in godoc
output, providing a high-level introduction to the package's purpose and
capabilities. It outlines the core features (efficient lookups, edge creation,
cluster management, package identification, orphan detection, traversal
strategies) and describes the major concepts (graph structure, packages,
iteration, orphan detection, thread safety).

The documentation includes practical code examples showing common usage
patterns. Rather than exhaustive API coverage, these examples demonstrate the
typical workflow—creating a graph, adding transactions, querying ancestors and
descendants, identifying packages, and iterating with custom filters. This
example-driven approach helps developers understand not just what methods exist
but how they fit together in real applications.

The README.md takes a different approach, targeting developers who are
evaluating or integrating the package. It starts with motivation—explaining
why transaction graphs matter for mempool management and what problems they
solve. This context helps readers understand when they need this package and
what benefits it provides.

The core concepts section defines terms like "transaction graph," "cluster,"
and "package" with emphasis on their practical implications. For example,
clusters are explained not just as connected components but as the unit of
analysis for RBF validation—replacements must improve the entire cluster's fee
rate, not individual transactions. This connects abstract concepts to concrete
use cases.

The quick start examples are runnable code demonstrating key workflows. The
first example shows basic graph construction with automatic edge creation. The
second demonstrates cluster iteration for fee analysis. The third covers
package identification and validation. The fourth shows advanced iteration with
functional options and filters. These examples progress from simple to complex,
building understanding incrementally.

The common patterns section captures best practices discovered during
implementation. It covers incremental graph building, package-aware relay,
ancestor/descendant limit enforcement, and efficient cleanup on block
confirmation. These patterns represent battle-tested approaches to recurring
problems, saving readers from rediscovering solutions.

The PackageAnalyzer interface receives special attention because it represents
the main extension point. The documentation explains the abstraction purpose
(separating graph structure from protocol validation) and lists the methods
each implementation must provide. This section helps developers who need to add
new package types or customize validation logic.

Performance characteristics provide guidance on the computational cost of
operations, helping developers make informed architectural decisions. The O(1),
O(n), and O(d) complexity annotations set expectations for different graph
sizes and query patterns. Mentioning the ~1KB memory overhead per transaction
helps with capacity planning for mempool implementations.
@coveralls
Copy link

coveralls commented Sep 30, 2025

Pull Request Test Coverage Report for Build 18118100572

Details

  • 1514 of 1638 (92.43%) changed or added relevant lines in 5 files are covered.
  • 38 unchanged lines in 3 files lost coverage.
  • Overall coverage increased (+1.0%) to 55.844%

Changes Missing Coverage Covered Lines Changed/Added Lines %
mempool/txgraph/package.go 441 473 93.23%
mempool/txgraph/iterator.go 390 435 89.66%
mempool/txgraph/graph.go 500 547 91.41%
Files with Coverage Reduction New Missed Lines %
connmgr/connmanager.go 2 83.9%
btcutil/gcs/gcs.go 4 80.95%
rpcclient/infrastructure.go 32 39.79%
Totals Coverage Status
Change from base Build 17994535365: 1.0%
Covered Lines: 32604
Relevant Lines: 58384

💛 - Coveralls

@Roasbeef Roasbeef marked this pull request as draft September 30, 2025 04:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants