Skip to content

feat(network): define NetworkTransport trait + TcpTransport (7/n)#15575

Merged
shreyan-gupta merged 3 commits intomasterfrom
shreyan-gupta/network/deentangle/07
Apr 28, 2026
Merged

feat(network): define NetworkTransport trait + TcpTransport (7/n)#15575
shreyan-gupta merged 3 commits intomasterfrom
shreyan-gupta/network/deentangle/07

Conversation

@shreyan-gupta
Copy link
Copy Markdown
Contributor

@shreyan-gupta shreyan-gupta commented Apr 15, 2026

  • NetworkTransport trait — 5-method dyn-safe trait:
    • send_message(tier, peer_id, msg), broadcast_message(tier, msg)
    • connect_to_peer(clock, peer_info, tier) -> ConnectHandle (async, default no-op)
    • disconnect_peer(peer_id, ban_reason) (default no-op), shutdown() (default no-op)
  • transport_info() is added in PR 9
  • TcpTransport — production impl wrapping existing Pool-based code
  • ConnectHandle — implements Future<Output = Result<(), ConnectError>>
  • PeerManagerActor now holds Arc<dyn NetworkTransport> alongside Arc<NetworkState>

@shreyan-gupta shreyan-gupta force-pushed the shreyan-gupta/network/deentangle/07 branch from 7c1eaef to f56d8ce Compare April 15, 2026 01:18
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 15, 2026

Codecov Report

❌ Patch coverage is 67.54386% with 74 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.46%. Comparing base (a9a453b) to head (864d458).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
...ain/network/src/peer_manager/peer_manager_actor.rs 37.73% 33 Missing ⚠️
...hain/network/src/peer_manager/network_transport.rs 4.76% 20 Missing ⚠️
...hain/network/src/peer_manager/network_state/mod.rs 84.00% 11 Missing and 1 partial ⚠️
chain/network/src/peer_manager/tcp_transport.rs 57.89% 8 Missing ⚠️
.../network/src/peer_manager/network_state/routing.rs 94.73% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #15575      +/-   ##
==========================================
- Coverage   69.48%   69.46%   -0.03%     
==========================================
  Files         940      942       +2     
  Lines      214797   214963     +166     
  Branches   214797   214963     +166     
==========================================
+ Hits       149262   149322      +60     
- Misses      59610    59720     +110     
+ Partials     5925     5921       -4     
Flag Coverage Δ
pytests-nightly 1.14% <0.00%> (-0.01%) ⬇️
unittests 68.91% <67.54%> (-0.02%) ⬇️
unittests-nightly 68.99% <67.10%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread chain/network/src/peer/peer_actor.rs Outdated
Comment thread chain/network/src/peer_manager/network_state/mod.rs
Comment thread chain/network/src/peer_manager/network_state/mod.rs
Comment thread chain/network/src/peer_manager/network_state/routing.rs
@shreyan-gupta shreyan-gupta force-pushed the shreyan-gupta/network/deentangle/06 branch from 067d970 to 5936035 Compare April 15, 2026 03:50
@shreyan-gupta shreyan-gupta force-pushed the shreyan-gupta/network/deentangle/07 branch from 52a0563 to 1a25eb9 Compare April 15, 2026 03:55
@shreyan-gupta shreyan-gupta force-pushed the shreyan-gupta/network/deentangle/06 branch from 5936035 to 6aba775 Compare April 15, 2026 04:03
@shreyan-gupta shreyan-gupta force-pushed the shreyan-gupta/network/deentangle/07 branch 2 times, most recently from 83791ae to c138ce8 Compare April 15, 2026 04:24
@shreyan-gupta shreyan-gupta force-pushed the shreyan-gupta/network/deentangle/06 branch from 6aba775 to 2cca9f3 Compare April 15, 2026 04:24
@shreyan-gupta shreyan-gupta force-pushed the shreyan-gupta/network/deentangle/07 branch 2 times, most recently from 5ca1a60 to a622b9d Compare April 15, 2026 19:22
@shreyan-gupta shreyan-gupta force-pushed the shreyan-gupta/network/deentangle/06 branch from 2cca9f3 to 3012d05 Compare April 15, 2026 19:22
@shreyan-gupta shreyan-gupta force-pushed the shreyan-gupta/network/deentangle/07 branch from a622b9d to 86d7eb3 Compare April 15, 2026 22:35
@shreyan-gupta shreyan-gupta force-pushed the shreyan-gupta/network/deentangle/06 branch from 3012d05 to db586cd Compare April 15, 2026 22:39
@shreyan-gupta shreyan-gupta force-pushed the shreyan-gupta/network/deentangle/07 branch 2 times, most recently from 24d7e74 to f58c995 Compare April 15, 2026 23:04
@shreyan-gupta shreyan-gupta force-pushed the shreyan-gupta/network/deentangle/06 branch from db586cd to d4bcea4 Compare April 15, 2026 23:04
@shreyan-gupta shreyan-gupta force-pushed the shreyan-gupta/network/deentangle/07 branch from f58c995 to be1d301 Compare April 16, 2026 02:37
@shreyan-gupta shreyan-gupta force-pushed the shreyan-gupta/network/deentangle/06 branch from d4bcea4 to 03cb9f6 Compare April 16, 2026 02:37
@shreyan-gupta shreyan-gupta changed the title feat(network): [7/n] define NetworkTransport trait + TcpTransport wrapper feat(network): 7/n define NetworkTransport trait + TcpTransport + thread transport Apr 16, 2026
@shreyan-gupta shreyan-gupta changed the title feat(network): 7/n define NetworkTransport trait + TcpTransport + thread transport feat(network): define NetworkTransport trait + TcpTransport (7/n) Apr 16, 2026
@shreyan-gupta shreyan-gupta marked this pull request as ready for review April 16, 2026 04:13
@shreyan-gupta shreyan-gupta requested a review from a team as a code owner April 16, 2026 04:13
@shreyan-gupta shreyan-gupta requested review from pugachAG, saketh-are and wacban and removed request for a team April 16, 2026 04:13
@shreyan-gupta shreyan-gupta force-pushed the shreyan-gupta/network/deentangle/07 branch from 942b085 to 40ce58d Compare April 27, 2026 21:22
@shreyan-gupta shreyan-gupta force-pushed the shreyan-gupta/network/deentangle/06 branch from 9910399 to 745bd65 Compare April 27, 2026 22:04
@shreyan-gupta shreyan-gupta force-pushed the shreyan-gupta/network/deentangle/07 branch from 40ce58d to a5c7cb8 Compare April 27, 2026 22:04
@shreyan-gupta shreyan-gupta force-pushed the shreyan-gupta/network/deentangle/06 branch from 745bd65 to eab51e8 Compare April 27, 2026 22:16
@shreyan-gupta shreyan-gupta force-pushed the shreyan-gupta/network/deentangle/07 branch 2 times, most recently from a83854c to b774ffb Compare April 27, 2026 22:33
@shreyan-gupta shreyan-gupta force-pushed the shreyan-gupta/network/deentangle/06 branch from 83cb624 to 5944472 Compare April 27, 2026 22:53
@shreyan-gupta shreyan-gupta force-pushed the shreyan-gupta/network/deentangle/07 branch from b774ffb to 4a69807 Compare April 27, 2026 22:53
Base automatically changed from shreyan-gupta/network/deentangle/06 to master April 27, 2026 23:23
@shreyan-gupta shreyan-gupta force-pushed the shreyan-gupta/network/deentangle/07 branch from 4a69807 to 9869ba3 Compare April 27, 2026 23:29
}
}

fn broadcast_message(&self, tier: tcp::Tier, msg: Arc<PeerMessage>) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that only T2 broadcast is supported I suggest we just remove tier argument

clock: &time::Clock,
edge: Edge,
info: PeerConnectionInfo,
transport: Arc<dyn NetworkTransport>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to introduce transport as a field in NetworkState instead of passing to its functions? seems like it feels here better than in PeerActor

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had tried that but I think it's not possible due to circular dependencies and ownership issues. I can take a deeper look.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transport that NetworkState methods receive comes from PeerManagerActor. PMA constructs it via TcpTransport::new(state.clone()) — so NetworkState is built first, then TcpTransport wraps an Arc of it (it needs Arc<NetworkState> to spawn PeerActors). PMA holds both and passes transport into the methods that need it.

Storing transport on NetworkState would close a ref cycle (NetworkStatetransportstate) and leak both. We could break the cycle with Weak<dyn NetworkTransport> on NetworkState (via Arc::new_cyclic), but every call would then need self.transport.upgrade().expect(...).<method>() — extra boilerplate for an upgrade that can only fail during teardown.

I'd leave transport as a parameter, but happy to revisit if you have a structuring in mind that I'm not seeing.

🤖 Response from Claude Code on behalf of Shreyan

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense, but in this case I find it weird that NetworkState both need NetworkTransport instance and also itself implements it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me get back to this problem after all the refactoring. I did have some ideas of introducing an interface for the upward communication from transport layer to the network graph layer (network state) and this can fall as a part of that.

But that said, better to unblock as is for now.

Copilot AI review requested due to automatic review settings April 28, 2026 14:55
@shreyan-gupta shreyan-gupta force-pushed the shreyan-gupta/network/deentangle/07 branch from 9869ba3 to 27c372d Compare April 28, 2026 14:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a transport abstraction layer in chain/network by defining a dyn-safe NetworkTransport trait and wiring message delivery paths to use it, with an initial production implementation (TcpTransport) wrapping the existing pool-based networking.

Changes:

  • Add NetworkTransport trait and ConnectHandle future type.
  • Implement TcpTransport as the production transport using existing NetworkState tier pools.
  • Thread Arc<dyn NetworkTransport> through PeerManagerActor, PeerActor, and NetworkState message/broadcast APIs; update affected tests/helpers.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
chain/network/src/peer_manager/tests/tier1.rs Update tier1 tests to pass a NetworkTransport when sending messages.
chain/network/src/peer_manager/testonly.rs Update test-only ping helper to pass a NetworkTransport.
chain/network/src/peer_manager/tcp_transport.rs Add production TcpTransport implementing NetworkTransport.
chain/network/src/peer_manager/peer_manager_actor.rs Store Arc<dyn NetworkTransport> in PMA and use it for shutdown + routing sends.
chain/network/src/peer_manager/network_transport.rs Define NetworkTransport, ConnectHandle, and ConnectError.
chain/network/src/peer_manager/network_state/routing.rs Route routing-table broadcasts and account announcements via NetworkTransport.
chain/network/src/peer_manager/network_state/mod.rs Update message-send APIs to take &dyn NetworkTransport / Arc<dyn ...>; pass transport through connect/disconnect flows.
chain/network/src/peer_manager/mod.rs Export new network_transport and tcp_transport modules.
chain/network/src/peer/testonly.rs Update peer test harness to provide a NetworkTransport to PeerActor::spawn.
chain/network/src/peer/peer_actor.rs Add transport field to PeerActor and thread transport through edge/routing/account flows.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1520 to +1527
let new_edge =
edge.remove_edge(this.config.node_id(), &this.config.node_key);
this.add_edges(&clock, EdgesWithSource::Local(vec![new_edge.clone()]))
.await
.unwrap()
let transport = TcpTransport::new(this.clone());
this.add_edges(
&clock,
EdgesWithSource::Local(vec![new_edge.clone()]),
transport,
)
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix_local_edges creates a new TcpTransport inside NetworkState to satisfy the add_edges(..., transport) API. This couples NetworkState to the production transport and will break the goal of having NetworkState be transport-agnostic (e.g. for a future TestLoop transport). Consider changing fix_local_edges to accept/capture an Arc<dyn NetworkTransport> from the caller (PeerManagerActor) instead of constructing TcpTransport here, and remove the TcpTransport dependency from network_state.

Copilot uses AI. Check for mistakes.
Comment on lines +368 to 369
let transport: Arc<dyn NetworkTransport> = TcpTransport::new(state.clone());
async move {
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A TcpTransport instance is created for the listener loop here, but another TcpTransport::new(state.clone()) is created later when constructing PeerManagerActor. Since the PR description notes that Pools will move into TcpTransport in a follow-up, having multiple transport instances risks splitting/duplicating transport-owned state. Prefer creating a single Arc<dyn NetworkTransport> once and cloning it into both the listener loop and the actor struct.

Copilot uses AI. Check for mistakes.
Comment on lines +233 to +235
let transport = TcpTransport::new(network_state.clone());
let (addr, handshake_signal) =
Self::spawn(clock, actor_system, stream, network_state, transport)?;
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spawn_and_handshake constructs a fresh TcpTransport internally, which hard-codes the production transport and also increases the number of TcpTransport instances created per connection attempt. If TcpTransport becomes stateful (as planned when Pools move into it), this pattern can cause subtle divergence. Consider taking Arc<dyn NetworkTransport> as a parameter (or otherwise sourcing it from the caller) so this helper can work with non-TCP transports and reuse the shared transport instance.

Copilot uses AI. Check for mistakes.
/// Transitional: wraps NetworkState's Pools. In PR 9, Pools move to
/// TcpTransport directly.
pub(crate) struct TcpTransport {
pub state: Arc<NetworkState>,
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TcpTransport exposes its state field as pub, which makes it easy for other modules to bypass the NetworkTransport abstraction and reach directly into NetworkState/Pools. Since this type is meant to encapsulate transport behavior, consider making the field private (and exposing only the trait methods), to keep the abstraction boundary intact.

Suggested change
pub state: Arc<NetworkState>,
state: Arc<NetworkState>,

Copilot uses AI. Check for mistakes.
@shreyan-gupta shreyan-gupta added this pull request to the merge queue Apr 28, 2026
shreyan-gupta and others added 3 commits April 28, 2026 15:46
…hread transport

Define the 6-method NetworkTransport trait (send_message,
broadcast_message, connect_to_peer, disconnect_peer, shutdown,
get_connection_info) and create TcpTransport as a transitional
wrapper delegating to NetworkState's Pools.

Thread transport through all NetworkState methods that send or
broadcast. PMA holds Arc<dyn NetworkTransport>, PeerActor holds
Arc<dyn NetworkTransport>. stop_actor uses
transport.broadcast_message(T2, Disconnect) + transport.shutdown().

Methods updated: send_message_to_peer, send_message_to_account,
send_pong, send_ping, broadcast_routing_table_update, add_edges,
add_accounts, finalize_edge, on_peer_connected,
on_peer_disconnected, register, unregister, fix_local_edges.

All delivery now goes through the trait. Pools remain on
NetworkState (move to TcpTransport in PR 9).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- broadcast_message: replace the all-three-tier match with explicit
  unreachable!() for T1/T3. The previous debug_assert + match masked
  the fact that broadcast is only valid on T2; the unreachable arms
  document and enforce the trait contract directly.
- disconnect_peer: drop the early `return` after the first matching
  tier. A peer can be connected on multiple tiers simultaneously
  (e.g. T1 + T2 for a validator); stop the connection on every tier
  we find it on, otherwise a partial disconnect leaves a stale entry.

Addresses VanBarbascu's review on #15575.
Only T2 has broadcast semantics — T1 (direct validator routing) and
T3 (direct state sync) do not. The tier argument was effectively a
constant at every call site and the unreachable! arms documented a
capability we do not have.

Drop the parameter from the trait, simplify TcpTransport's impl, and
update both call sites (peer_manager_actor stop_actor and
network_state::broadcast_routing_table_update).
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 28, 2026
@shreyan-gupta shreyan-gupta added this pull request to the merge queue Apr 28, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 28, 2026
@shreyan-gupta shreyan-gupta force-pushed the shreyan-gupta/network/deentangle/07 branch from 27c372d to 864d458 Compare April 28, 2026 20:03
@shreyan-gupta shreyan-gupta added this pull request to the merge queue Apr 28, 2026
Merged via the queue into master with commit 6ff3b8b Apr 28, 2026
30 of 32 checks passed
@shreyan-gupta shreyan-gupta deleted the shreyan-gupta/network/deentangle/07 branch April 28, 2026 21:11
pull Bot pushed a commit to Superoldman96/nearcore that referenced this pull request Apr 28, 2026
## Background

\`pytest sanity/rpc_view_history.py\` has been failing intermittently in
the merge queue. Initial diagnosis (see PR description history)
suggested a master-state regression, but a closer look at the test0 logs
from a failing run showed the test was racing on doomslug's first-block
timing.

## What's actually wrong

The test queries account state at \`block=1\` to verify historical RPC
works. In a 3-validator dev cluster with
\`min_block_production_delay=1s\`, doomslug can skip the very first
height — endorsements for block 1 don't arrive in time, the cluster goes
straight from height 0 to height 2, and \`BlockHeight\` column has no
entry for height 1.

Sequence from the failing test0 log:
\`\`\`
19:49:20.865 producing optimistic block validator=test0 height=1
prev_height=0
19:49:21.745 collect_block_approval target_height=1 (test1 endorsement)
19:49:22.888 considering blocks for production start_height=1
end_height=2
19:49:22.889 doomslug: not ready to produce block target_height=1
19:49:22.892 produce_block{next_height=2}              ← jumped to 2
\`\`\`

When the test then queries \`block=1\`, \`get_block_hash_by_height(1)\`
returns \`DBNotFoundErr\` → \`UnknownBlock\` → \`handle_unknown_block\`
returns HTTP 422.

This is a flake in the test, not a regression. Same flake hit:
- [run 4557](https://nayduck.nearone.org/#/run/4557) (PR near#15478, master
\`6c6c99a\`) — non-nightly variant
- [run 4561](https://nayduck.nearone.org/#/run/4561), [run
4562](https://nayduck.nearone.org/#/run/4562) (PR near#15575, master
\`a9a453b\`) — nightly variant

A successful retry of PR near#15478 ([run
4560](https://nayduck.nearone.org/#/run/4560)) passed the same test on
the same SHA \`a9a453b\`, confirming the flake.

## Fix

Capture the block height after \`wait_for_blocks(target=3)\` via
\`get_latest_block().height\` — that block is already on the canonical
chain, so it's safe to query. Replace the hardcoded \`block=1\` with
this captured height.

## Verification

[NayDuck run near#4565](https://nayduck.nearone.org/#/run/4565) — 15× each
of the stable and nightly variants (30 total) on this branch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants