[KLC-2416] p2p: disable BasicConnMgr peer cap on seed nodes#55
Conversation
…es [KLC-2416]
Add IsSeedNode flag to ArgsNetworkMessenger and use it to wire
libp2p.ConnectionManager(connmgr.NullConnMgr{}) only on seed-node
builds. Removes the 160/192 BasicConnMgr peer cap that was throttling
seed peer counts; ResourceManager + yamux keepalives remain in place
as the upper bound and stale-connection cleanup respectively.
Set IsSeedNode: true in cmd/seednode/main.go. Other callers
(factory/networkComponents.go, integration tests) are unchanged.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🧰 Additional context used📓 Path-based instructions (3)**/*.go📄 CodeRabbit inference engine (Custom checks)
Files:
network/**⚙️ CodeRabbit configuration file
Files:
**/*_test.go⚙️ CodeRabbit configuration file
Files:
🧠 Learnings (1)📚 Learning: 2026-04-21T20:12:22.959ZApplied to files:
🔇 Additional comments (12)
WalkthroughAdds P2P ResourceManager configuration and a seed-node flag that makes the network messenger create libp2p hosts with connmgr.NullConnMgr; extracts address-factory logic, implements selectable ResourceManager strategies (default, null, libp2p-default, scaled), wires seednode CLI, and adds tests and config docs. ChangesSeednode connection + resource manager
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 3❌ Failed checks (3 warnings)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
This PR adjusts libp2p connection-manager behavior for the seednode binary so seeds are no longer constrained by libp2p’s default BasicConnMgr(160,192) peer trimming, allowing seeds to accept more peers as resources permit.
Changes:
- Add an explicit
IsSeedNodeflag toArgsNetworkMessengerand, when set, wirelibp2p.ConnectionManager(connmgr.NullConnMgr{}). - Set
IsSeedNode: trueincmd/seednode/main.goso only the seednode binary opts into the behavior. - Add unit tests to assert the connection manager selection for seed vs. non-seed, plus a test-only host accessor.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| network/p2p/libp2p/netMessenger.go | Adds IsSeedNode arg and conditionally installs NullConnMgr to disable default peer trimming. |
| cmd/seednode/main.go | Sets IsSeedNode: true for the seednode binary. |
| network/p2p/libp2p/netMessenger_test.go | Adds tests verifying NullConnMgr is used only when IsSeedNode is enabled. |
| network/p2p/libp2p/export_test.go | Adds a test-only Host() accessor to inspect the underlying host’s ConnManager(). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…2416]
The original NullConnMgr fix lifted BasicConnMgr's 160/192 watermarks but
production testing on a 4 GiB seed host showed peer count remained pinned
at ~187 with new dials failing during the security handshake ("failed to
negotiate security protocol: EOF"). Root cause: libp2p's
DefaultResourceManager auto-scales from host memory and on a 4 GiB host
caps System.Conns at exactly 187 (128 + 128*(MiB/1024)). ConnMgr trims
existing connections; ResourceManager rejects new ones, so both layers
matter.
Add a P2pConfig.ResourceManager YAML section with four strategies:
"" same as "default" — libp2p auto-scale based on actual host RAM
"default" libp2p auto-scaled DefaultResourceManager
"null" disable libp2p resource limits (bounded only by OS)
"scaled" auto-scale as if host had scaledMemoryMiB MiB
(lifts caps on small hosts without going fully unbounded)
Wire libp2p.ResourceManager() in NewNetworkMessenger via the new
buildResourceManagerOption helper. The helper also returns an io.Closer
for the "scaled" path so the freshly-constructed rcmgr is cleaned up if
libp2p.New fails before the host can take ownership.
config/seednode/config.yaml ships with strategy: "null" so seeds get the
fix from config rather than binary-side logic. config/node/config.yaml
documents the section but leaves it commented (regular nodes keep libp2p
default unless an operator opts in).
Bridge five libp2p subsystem loggers (rcmgr, swarm2, connmgr, net/identify,
autonat) into klever-go's external/<name> namespace so operators can
diagnose connection-acceptance issues via --log-level external/rcmgr:TRACE.
Add 8 wiring tests covering all four strategies plus error paths
(invalid strategy; scaled with zero memory). Existing ConnMgr tests
unchanged.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@network/p2p/libp2p/netMessenger.go`:
- Around line 1270-1275: The getFDLimit function currently casts rLimit.Cur
(uint64) straight to int which can overflow when RLIMIT_NOFILE == RLIM_INFINITY;
update getFDLimit to detect unlimited or out-of-range values by checking if
rLimit.Cur == syscall.RLIM_INFINITY or rLimit.Cur > uint64(math.MaxInt) and in
that case return a safe large int (e.g., math.MaxInt) instead of performing the
direct cast; otherwise cast rLimit.Cur to int as before. Reference symbols:
getFDLimit, rLimit, syscall.RLIMIT_NOFILE, syscall.RLIM_INFINITY, and the cast
of rLimit.Cur.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3a0ef898-43ca-4087-a4d9-974a3372d0af
📒 Files selected for processing (5)
config/node/config.yamlconfig/p2p.goconfig/seednode/config.yamlnetwork/p2p/libp2p/netMessenger.gonetwork/p2p/libp2p/netMessenger_test.go
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: test
- GitHub Check: Analyze (go)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.go
📄 CodeRabbit inference engine (Custom checks)
**/*.go: Verify that any new or modified concurrent code (goroutines, channels, mutexes, sync primitives) is free of race conditions. Check for: proper lock/unlock pairing, no goroutine leaks, correct channel lifecycle management, and proper context cancellation propagation.
Verify that errors are not silently discarded. Check for: unchecked error returns, error wrapping with context, proper error propagation up the call chain, and no bare panic() calls outside of init() functions.
Files:
config/p2p.gonetwork/p2p/libp2p/netMessenger.gonetwork/p2p/libp2p/netMessenger_test.go
network/**
⚙️ CodeRabbit configuration file
network/**: Peer-to-peer networking layer. - Check for proper input validation on all received messages - Verify rate limiting and DoS protection mechanisms - Ensure connection handling is goroutine-safe - Look for potential message amplification attacks - Verify TLS/authentication on peer connections
Files:
network/p2p/libp2p/netMessenger.gonetwork/p2p/libp2p/netMessenger_test.go
**/*_test.go
⚙️ CodeRabbit configuration file
**/*_test.go: Test files. Review for: - Adequate coverage of edge cases and error paths - Proper use of test helpers and assertions - Race condition coverage (tests should use -race flag patterns) - No hardcoded sleep for synchronization (use channels or sync primitives) - Test isolation (no shared mutable state between tests)
Files:
network/p2p/libp2p/netMessenger_test.go
🧠 Learnings (1)
📚 Learning: 2026-04-21T20:12:22.959Z
Learnt from: phcarneirobc
Repo: klever-io/klever-go PR: 38
File: indexer/eventsProcessor.go:188-211
Timestamp: 2026-04-21T20:12:22.959Z
Learning: In Go structs that are JSON-marshaled, if a field is a `bool` and has the `json:"...,omitempty"` tag, then leaving that field at its zero value (`false`) is functionally equivalent (in the resulting JSON) to explicitly setting `Foundation: false`. Reviewers should not flag struct literals that omit such `bool` fields as an inconsistency; they will serialize identically because `omitempty` suppresses `false` values.
Applied to files:
config/p2p.gonetwork/p2p/libp2p/netMessenger.gonetwork/p2p/libp2p/netMessenger_test.go
- Reduce NewNetworkMessenger cognitive complexity (16 → below 15) by extracting the broadcast multiaddr setup into buildAddressFactory. - Fix gosec G115 in getFDLimit by clamping uint64 → int with math.MaxInt. - Add unit and parity tests for buildAddressFactory.
Summary
Seed nodes were artificially capped at ~186-192 peers and refused new dials after sustained uptime (clients saw
failed to negotiate security protocol: EOF). Diagnosis on a 4 GiB seed host found two libp2p caps converging at the same number:System.Conns = 128 + (128 × MiB/1024)= exactly 187 on a 4 GiB host, refusing new inbound during the security handshakeThis PR addresses both layers for seed nodes, adds a configurable RM strategy so operators can tune without rebuilding, and bridges libp2p subsystem loggers for diagnosis.
Changes
Library (
network/p2p/libp2p/netMessenger.go)IsSeedNode boolonArgsNetworkMessenger— when true, replaces libp2p's defaultBasicConnMgrwithNullConnMgr. Only controls ConnMgr; ResourceManager is decoupled.buildResourceManagerOption()helper driven byP2pConfig.ResourceManager.Strategy. Returns anio.Closerfor the "scaled" path so a freshly-constructed rcmgr is cleaned up iflibp2p.Newfails.rcmgr,swarm2,connmgr,net/identify,autonat) into klever-go'sexternal/<name>namespace.Config (
config/p2p.go)New
ResourceManagerConfigwith four strategies:strategy""/"default"DefaultResourceManager(sized from actual host RAM)"null""scaled"scaledMemoryMiBMiB — useful to lift caps on small hosts without going fully unboundedShipped YAMLs
config/seednode/config.yaml— ships with activestrategy: "null"(seed-node policy lives in config, not binary)config/node/config.yaml— section documented as comments only (regular nodes keep libp2p default unless operator opts in)Seed binary (
cmd/seednode/main.go)Unchanged from the first commit (still sets
IsSeedNode: true). The seed RM policy is now data, not code.Diagnostic logging
After this PR, operators can enable libp2p internals for connection debugging:
./seednode --log-level "*:INFO,external/rcmgr:TRACE,external/swarm2:TRACE"The
external/<name>:TRACEklever-go level maps toDEBUGon the corresponding libp2p subsystem (existing bridge insetupExternalP2PLoggers).Why explicit flag vs. inferring from sharder type
Initial implementation gated on
Sharding.Type == NilListSharder. All three current configs (seed, regular, integration tests) useNilListSharder, so that signal does not distinguish seed from regular nodes. Switched to an explicitIsSeedNodeflag set only bycmd/seednode/main.go.Follow-up tickets
NilListSharderconfigured +BasicConnMgrsilently active). Separate decision needed./peers,/node/status,/node/metrics). The 404s on/node/metricsfrom external tooling confirm demand.Test plan
go build ./...cleango vetandgofmtclean"null","default","scaled")"scaled"with 0 memory; invalid strategy)strategy: "null"external/rcmgr:TRACEproduces expected libp2p logs on demandNetworking Impact
This PR only affects networking: it disables libp2p BasicConnMgr peer-cap caps for seed nodes (via a new IsSeedNode flag) and adds configurable libp2p ResourceManager strategies and logging bridges. Seed nodes can hold many more peers (seed config defaults to ResourceManager strategy "null"), increasing memory and file-descriptor pressure; keepalives and libp2p transport behavior (e.g., yamux eviction of stale connections) remain unchanged. The change does not modify consensus, transaction processing, state management, KVM, or any core data integrity logic.
Changes Summary
network/p2p/libp2p/netMessenger.go
cmd/seednode/main.go
Tests
Configuration
Stability, Resource & Error Handling Concerns
Operational Notes & Follow-ups
Tests & Status