Skip to content

Commit b2dff9c

Browse files
Copilotevan-forbes
andauthored
fix: remove peers from reactors when stopAndRemovePeer called with nil reason (#2688)
## Description When `stopAndRemovePeer` is called with `reason=nil` (during `OnStop()` or `StopPeerGracefully()`), peers were removed from the switch but not from individual reactors, leaving stale references. **Changes:** - Modified `stopAndRemovePeer` to default `reason` to `"stopping"` when nil - Removed conditional check that skipped `removePeerFromAllReactors` when `reason=nil` - Added test verifying peers are removed from reactor peer sets even with nil reason ```go func (sw *Switch) stopAndRemovePeer(peer Peer, reason interface{}) { // ... if reason == nil { reason = "stopping" } schema.WritePeerUpdate(sw.traceClient, string(peer.ID()), schema.PeerDisconnect, fmt.Sprintf("%v", reason)) sw.removePeerFromAllReactors(peer, reason) // Now always called // ... } ``` > [!WARNING] > > <details> > <summary>Firewall rules blocked me from connecting to one or more addresses (expand for details)</summary> > > #### I tried to connect to the following addresses, but was blocked by firewall rules: > > - `ya.ru` > - Triggering command: `/tmp/go-build3042114909/b001/p2p.test /tmp/go-build3042114909/b001/p2p.test -test.testlogfile=/tmp/go-build3042114909/b001/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true` (dns block) > - Triggering command: `/tmp/go-build3136998786/b001/p2p.test /tmp/go-build3136998786/b001/p2p.test -test.testlogfile=/tmp/go-build3136998786/b001/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true` (dns block) > - Triggering command: `/tmp/go-build491487171/b001/p2p.test /tmp/go-build491487171/b001/p2p.test -test.testlogfile=/tmp/go-build491487171/b001/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true` (dns block) > > If you need me to access, download, or install something from one of these locations, you can either: > > - Configure [Actions setup steps](https://gh.io/copilot/actions-setup-steps) to set up my environment, which run before the firewall is enabled > - Add the appropriate URLs or hosts to the custom allowlist in this repository's [Copilot coding agent settings](https://github.com/celestiaorg/celestia-core/settings/copilot/coding_agent) (admins only) > > </details> <!-- START COPILOT CODING AGENT SUFFIX --> <details> <summary>Original prompt</summary> > > ---- > > *This section details on the original issue you should resolve* > > <issue_title>Zombie peers in the reactors</issue_title> > <issue_description>I noticed that when cycling through many peers, we would try to add a peer to a reactor however the peer was already in the reactor. > > This means that inherently there is an issue with the peer being removed from the switch but not the reactor. I haven't dug into it thoroughly, however it might be that when removing peers with reason == nil (I don't know where nil is being given as a reason) we remove the peers from the switch but skip the reactors. > > We could instead simply add a noop reason for now, and long term get rid of reasons.</issue_description> > > <agent_instructions>basically change the code to this > > func (sw *Switch) stopAndRemovePeer(peer Peer, reason interface{}) { > sw.transport.Cleanup(peer) > if err := peer.Stop(); err != nil { > sw.Logger.Error("error while stopping peer", "error", err) // TODO: should return error to be handled accordingly > } > if reason == nil { > reason = "stopping" > } > schema.WritePeerUpdate(sw.traceClient, string(peer.ID()), schema.PeerDisconnect, fmt.Sprintf("%v", reason)) > sw.removePeerFromAllReactors(peer, reason)</agent_instructions> > > ## Comments on the Issue (you are @copilot in this section) > > <comments> > </comments> > </details> - Fixes #2687 <!-- START COPILOT CODING AGENT TIPS --> --- 💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs. --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: evan-forbes <[email protected]> Co-authored-by: evan-forbes <[email protected]>
1 parent b5d759d commit b2dff9c

File tree

2 files changed

+57
-3
lines changed

2 files changed

+57
-3
lines changed

p2p/switch.go

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -419,10 +419,11 @@ func (sw *Switch) stopAndRemovePeer(peer Peer, reason interface{}) {
419419
if err := peer.Stop(); err != nil {
420420
sw.Logger.Error("error while stopping peer", "error", err) // TODO: should return error to be handled accordingly
421421
}
422-
schema.WritePeerUpdate(sw.traceClient, string(peer.ID()), schema.PeerDisconnect, fmt.Sprintf("%v", reason))
423-
if reason != nil {
424-
sw.removePeerFromAllReactors(peer, reason)
422+
if reason == nil {
423+
reason = "stopping"
425424
}
425+
schema.WritePeerUpdate(sw.traceClient, string(peer.ID()), schema.PeerDisconnect, fmt.Sprintf("%v", reason))
426+
sw.removePeerFromAllReactors(peer, reason)
426427

427428
// Removing a peer should go last to avoid a situation where a peer
428429
// reconnect to our node and the switch calls InitPeer before

p2p/switch_test.go

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -841,6 +841,59 @@ func TestSwitchInitPeerIsNotCalledBeforeRemovePeer(t *testing.T) {
841841
assert.False(t, reactor.InitCalledBeforeRemoveFinished())
842842
}
843843

844+
// TestPeerRemovedFromReactorOnStopWithNilReason verifies that peers are properly
845+
// removed from reactors even when stopAndRemovePeer is called with reason=nil
846+
func TestPeerRemovedFromReactorOnStopWithNilReason(t *testing.T) {
847+
// make reactor
848+
reactor := NewTestReactor("foo", []*conn.ChannelDescriptor{
849+
{ID: byte(0x00), Priority: 10},
850+
{ID: byte(0x01), Priority: 10},
851+
}, false)
852+
853+
// make switch
854+
sw := MakeSwitch(cfg, 1, func(i int, sw *Switch) *Switch {
855+
sw.AddReactor("foo", reactor)
856+
return sw
857+
})
858+
err := sw.Start()
859+
require.NoError(t, err)
860+
defer func() {
861+
if err := sw.Stop(); err != nil {
862+
t.Error(err)
863+
}
864+
}()
865+
866+
// add peer
867+
rp := &remotePeer{PrivKey: ed25519.GenPrivKey(), Config: cfg}
868+
rp.Start()
869+
defer rp.Stop()
870+
_, err = rp.Dial(sw.NetAddress())
871+
require.NoError(t, err)
872+
873+
// wait till the switch adds rp to the peer set
874+
var peer Peer
875+
for {
876+
time.Sleep(20 * time.Millisecond)
877+
if peer = sw.Peers().Get(rp.ID()); peer != nil {
878+
break
879+
}
880+
}
881+
882+
// verify peer is in reactor's peer set
883+
reactorPeerSet := sw.peerSetForReactor(reactor)
884+
require.NotNil(t, reactorPeerSet)
885+
require.True(t, reactorPeerSet.Has(peer.ID()), "peer should be in reactor peer set")
886+
887+
// stop and remove peer with nil reason (simulating OnStop or StopPeerGracefully)
888+
sw.stopAndRemovePeer(peer, nil)
889+
890+
// verify peer is removed from reactor's peer set
891+
assert.False(t, reactorPeerSet.Has(peer.ID()), "peer should be removed from reactor peer set even with nil reason")
892+
893+
// verify peer is removed from switch's peer set
894+
assert.False(t, sw.Peers().Has(peer.ID()), "peer should be removed from switch peer set")
895+
}
896+
844897
func BenchmarkSwitchBroadcast(b *testing.B) {
845898
s1, s2 := MakeSwitchPair(func(i int, sw *Switch) *Switch {
846899
// Make bar reactors of bar channels each

0 commit comments

Comments
 (0)