Skip to content

[KLC-2416] p2p: disable BasicConnMgr peer cap on seed nodes#55

Merged
fbsobreira merged 3 commits into
developfrom
chore/KLC-2416-seed-null-connmgr
May 20, 2026
Merged

[KLC-2416] p2p: disable BasicConnMgr peer cap on seed nodes#55
fbsobreira merged 3 commits into
developfrom
chore/KLC-2416-seed-null-connmgr

Conversation

@fbsobreira
Copy link
Copy Markdown
Member

@fbsobreira fbsobreira commented May 20, 2026

Summary

Seed nodes were artificially capped at ~186-192 peers and refused new dials after sustained uptime (clients saw failed to negotiate security protocol: EOF). Diagnosis on a 4 GiB seed host found two libp2p caps converging at the same number:

  1. BasicConnMgr default watermarks (160/192) — trimmed existing connections
  2. DefaultResourceManager scope limit — System.Conns = 128 + (128 × MiB/1024) = exactly 187 on a 4 GiB host, refusing new inbound during the security handshake

This PR addresses both layers for seed nodes, adds a configurable RM strategy so operators can tune without rebuilding, and bridges libp2p subsystem loggers for diagnosis.

Changes

Library (network/p2p/libp2p/netMessenger.go)

  • New IsSeedNode bool on ArgsNetworkMessenger — when true, replaces libp2p's default BasicConnMgr with NullConnMgr. Only controls ConnMgr; ResourceManager is decoupled.
  • New buildResourceManagerOption() helper driven by P2pConfig.ResourceManager.Strategy. Returns an io.Closer for the "scaled" path so a freshly-constructed rcmgr is cleaned up if libp2p.New fails.
  • Bridged five libp2p subsystem loggers (rcmgr, swarm2, connmgr, net/identify, autonat) into klever-go's external/<name> namespace.

Config (config/p2p.go)

New ResourceManagerConfig with four strategies:

strategy Behavior
"" / "default" libp2p auto-scaled DefaultResourceManager (sized from actual host RAM)
"null" disable libp2p resource limits — bounded only by OS (ulimit, RAM)
"scaled" auto-scale as if host had scaledMemoryMiB MiB — useful to lift caps on small hosts without going fully unbounded

Shipped YAMLs

  • config/seednode/config.yaml — ships with active strategy: "null" (seed-node policy lives in config, not binary)
  • config/node/config.yaml — section documented as comments only (regular nodes keep libp2p default unless operator opts in)

Seed binary (cmd/seednode/main.go)

Unchanged from the first commit (still sets IsSeedNode: true). The seed RM policy is now data, not code.

Diagnostic logging

After this PR, operators can enable libp2p internals for connection debugging:

./seednode --log-level "*:INFO,external/rcmgr:TRACE,external/swarm2:TRACE"

The external/<name>:TRACE klever-go level maps to DEBUG on the corresponding libp2p subsystem (existing bridge in setupExternalP2PLoggers).

Why explicit flag vs. inferring from sharder type

Initial implementation gated on Sharding.Type == NilListSharder. All three current configs (seed, regular, integration tests) use NilListSharder, so that signal does not distinguish seed from regular nodes. Switched to an explicit IsSeedNode flag set only by cmd/seednode/main.go.

Follow-up tickets

  • KLC-2417 — regular nodes also have a config/runtime mismatch (NilListSharder configured + BasicConnMgr silently active). Separate decision needed.
  • KLC-2418 — seednode log noise + missing HTTP endpoints (/peers, /node/status, /node/metrics). The 404s on /node/metrics from external tooling confirm demand.

Test plan

  • go build ./... clean
  • Full libp2p test suite (12 packages) passing
  • go vet and gofmt clean
  • 14 new/updated wiring tests:
    • 2 ConnMgr tests (seed → Null, regular → default)
    • 1 RM test with subtests proving IsSeedNode does NOT affect RM choice
    • 3 RM strategy tests ("null", "default", "scaled")
    • 2 RM error-path tests ("scaled" with 0 memory; invalid strategy)
  • Manual: deployed to 4 GiB seed; peer count climbs past 187 with strategy: "null"
  • Manual: confirm external/rcmgr:TRACE produces expected libp2p logs on demand
  • Manual: confirm regular-node peer behavior is unchanged on a deployment without YAML override

Networking Impact

This PR only affects networking: it disables libp2p BasicConnMgr peer-cap caps for seed nodes (via a new IsSeedNode flag) and adds configurable libp2p ResourceManager strategies and logging bridges. Seed nodes can hold many more peers (seed config defaults to ResourceManager strategy "null"), increasing memory and file-descriptor pressure; keepalives and libp2p transport behavior (e.g., yamux eviction of stale connections) remain unchanged. The change does not modify consensus, transaction processing, state management, KVM, or any core data integrity logic.

Changes Summary

  • network/p2p/libp2p/netMessenger.go

    • Added IsSeedNode bool to ArgsNetworkMessenger. When true, NewNetworkMessenger uses connmgr.NullConnMgr instead of libp2p BasicConnMgr.
    • Added buildResourceManagerOption driven by P2pConfig.ResourceManager.Strategy with supported strategies: default/libp2p autoscaled, "null" (disable RM limits), and "scaled" (bounded RM using scaledMemoryMiB).
    • Ensures any created scaled ResourceManager closer is closed if libp2p.New fails.
    • Adds logging bridge namespaces for libp2p subsystems including rcmgr and connmgr.
    • Extracted address-advertisement factory helper.
  • cmd/seednode/main.go

    • createNode sets IsSeedNode: true (seed binary).
  • Tests

    • network/p2p/libp2p/netMessenger_test.go: tests for ConnMgr wiring (NullConnMgr for seed nodes), ResourceManager strategy paths (default, null, libp2p default, scaled, and error cases), and address-factory behavior.
    • network/p2p/libp2p/export_test.go: test-only accessor and BuildAddressFactory wrapper for testing.
    • Build and libp2p tests pass; added RM wiring/unit tests.
  • Configuration

    • config/p2p.go: adds ResourceManagerConfig and strategy constants; P2PConfig gains ResourceManager field.
    • config/seednode/config.yaml: default resourceManager.strategy: "null".
    • config/node/config.yaml: documents resourceManager tuning in comments for regular nodes.

Stability, Resource & Error Handling Concerns

  • Node stability / data integrity: No changes to consensus, transaction processing, state management, KVM, or core data integrity. No new concurrency model changes to core subsystems.
  • Resource usage: Seed nodes may consume substantially more RAM and file descriptors when BasicConnMgr trimming is disabled—operators must provision host resources or choose a ResourceManager strategy (e.g., "scaled") to bound usage. The default seed config sets strategy "null", which removes libp2p resource limits.
  • Error handling: buildResourceManagerOption returns an io.Closer for scaled RMs and the implementation ensures it is closed if libp2p.New fails, preventing leaks on init error. No other behavioral error-handling or concurrency changes were introduced.

Operational Notes & Follow-ups

  • Manual validation on a 4 GiB seed showed peer counts climb beyond prior ~187 when strategy: "null"; two manual confirmations remain open.
  • Follow-up tickets: KLC-2417 and KLC-2418 for related node/config/runtime and operational cleanup tasks.

Tests & Status

  • go build and libp2p tests: pass (includes new tests).
  • Added unit tests covering address factory and RM strategy behavior.

Review Change Stack

…es [KLC-2416]

Add IsSeedNode flag to ArgsNetworkMessenger and use it to wire
libp2p.ConnectionManager(connmgr.NullConnMgr{}) only on seed-node
builds. Removes the 160/192 BasicConnMgr peer cap that was throttling
seed peer counts; ResourceManager + yamux keepalives remain in place
as the upper bound and stale-connection cleanup respectively.

Set IsSeedNode: true in cmd/seednode/main.go. Other callers
(factory/networkComponents.go, integration tests) are unchanged.
Copilot AI review requested due to automatic review settings May 20, 2026 16:19
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 05252c29-99d6-445e-be76-4f183dc7b9a9

📥 Commits

Reviewing files that changed from the base of the PR and between d8e5f01 and 2ea9c63.

📒 Files selected for processing (3)
  • network/p2p/libp2p/export_test.go
  • network/p2p/libp2p/netMessenger.go
  • network/p2p/libp2p/netMessenger_test.go
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: setup-and-lint / setup-and-lint
  • GitHub Check: Analyze (go)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.go

📄 CodeRabbit inference engine (Custom checks)

**/*.go: Verify that any new or modified concurrent code (goroutines, channels, mutexes, sync primitives) is free of race conditions. Check for: proper lock/unlock pairing, no goroutine leaks, correct channel lifecycle management, and proper context cancellation propagation.
Verify that errors are not silently discarded. Check for: unchecked error returns, error wrapping with context, proper error propagation up the call chain, and no bare panic() calls outside of init() functions.

Files:

  • network/p2p/libp2p/export_test.go
  • network/p2p/libp2p/netMessenger_test.go
  • network/p2p/libp2p/netMessenger.go
network/**

⚙️ CodeRabbit configuration file

network/**: Peer-to-peer networking layer. - Check for proper input validation on all received messages - Verify rate limiting and DoS protection mechanisms - Ensure connection handling is goroutine-safe - Look for potential message amplification attacks - Verify TLS/authentication on peer connections

Files:

  • network/p2p/libp2p/export_test.go
  • network/p2p/libp2p/netMessenger_test.go
  • network/p2p/libp2p/netMessenger.go
**/*_test.go

⚙️ CodeRabbit configuration file

**/*_test.go: Test files. Review for: - Adequate coverage of edge cases and error paths - Proper use of test helpers and assertions - Race condition coverage (tests should use -race flag patterns) - No hardcoded sleep for synchronization (use channels or sync primitives) - Test isolation (no shared mutable state between tests)

Files:

  • network/p2p/libp2p/export_test.go
  • network/p2p/libp2p/netMessenger_test.go
🧠 Learnings (1)
📚 Learning: 2026-04-21T20:12:22.959Z
Learnt from: phcarneirobc
Repo: klever-io/klever-go PR: 38
File: indexer/eventsProcessor.go:188-211
Timestamp: 2026-04-21T20:12:22.959Z
Learning: In Go structs that are JSON-marshaled, if a field is a `bool` and has the `json:"...,omitempty"` tag, then leaving that field at its zero value (`false`) is functionally equivalent (in the resulting JSON) to explicitly setting `Foundation: false`. Reviewers should not flag struct literals that omit such `bool` fields as an inconsistency; they will serialize identically because `omitempty` suppresses `false` values.

Applied to files:

  • network/p2p/libp2p/export_test.go
  • network/p2p/libp2p/netMessenger_test.go
  • network/p2p/libp2p/netMessenger.go
🔇 Additional comments (12)
network/p2p/libp2p/netMessenger.go (7)

3-45: LGTM!


77-92: LGTM!


125-134: LGTM!


158-206: LGTM!


208-223: LGTM!


1243-1272: LGTM!


1275-1287: LGTM!

network/p2p/libp2p/export_test.go (1)

13-13: LGTM!

Also applies to: 28-30, 76-80

network/p2p/libp2p/netMessenger_test.go (4)

211-234: LGTM!


236-254: LGTM!


256-317: LGTM!


319-345: LGTM!


Walkthrough

Adds P2P ResourceManager configuration and a seed-node flag that makes the network messenger create libp2p hosts with connmgr.NullConnMgr; extracts address-factory logic, implements selectable ResourceManager strategies (default, null, libp2p-default, scaled), wires seednode CLI, and adds tests and config docs.

Changes

Seednode connection + resource manager

Layer / File(s) Summary
Configuration types and docs
config/p2p.go, config/node/config.yaml, config/seednode/config.yaml
Adds P2PConfig.ResourceManager (ResourceManagerConfig) and constants for strategies; documents resourceManager tuning in node YAML and sets strategy: "null" for seednode config.
Network messenger: imports, options, and resource manager
network/p2p/libp2p/netMessenger.go
Adds imports (syscall, connmgr, rcmgr), extends ArgsNetworkMessenger with IsSeedNode bool, replaces inline addrs logic with buildAddressFactory, appends libp2p.ConnectionManager(connmgr.NullConnMgr{}) when seed mode is enabled, and implements buildResourceManagerOption and getFDLimit to select or construct ResourceManagers (including scaled rcmgr) with validation and proper closer handling.
Address factory extraction & export
network/p2p/libp2p/netMessenger.go, network/p2p/libp2p/export_test.go
Moves broadcast IP multiaddr construction into buildAddressFactory and exposes it via exported BuildAddressFactory; adds go-multiaddr import alias used by the test wrapper.
Tests and test accessor
network/p2p/libp2p/export_test.go, network/p2p/libp2p/netMessenger_test.go
Adds Host() accessor on networkMessenger; tests assert NullConnMgr selection when IsSeedNode is true and exercise ResourceManager strategy paths (default, null, libp2p-default, scaled, and invalid config), plus BuildAddressFactory cases.
Seednode wiring
cmd/seednode/main.go
createNode now sets IsSeedNode: true when constructing libp2p.ArgsNetworkMessenger, enabling seed-node-specific connection/resource manager behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 3

❌ Failed checks (3 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 31.82% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Concurrency Safety ⚠️ Warning When createMessenger fails after spawning goroutines (via createPubSub), cancelFunc is never invoked, causing goroutine leaks. Goroutines remain indefinitely waiting on ctx.Done(). Call cancelFunc() before returning error in NewNetworkMessenger's createMessenger error path (line 200-203). Add: if cancelFunc != nil { cancelFunc() } before the return statement.
Error Handling ⚠️ Warning Three discarded errors in netMessenger.go: (1) sourceMultiAddr, _ := ma.NewMultiaddr() at line 157, (2) _ = rmCloser.Close() at line 194, (3) _ = logging.SetLogLevel() at line 232. Propagate ma.NewMultiaddr and logging.SetLogLevel errors. For rmCloser.Close() cleanup, use log.LogIfError() or handle gracefully instead of silently discarding.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title follows the required format [KLC-XXXX] type: description with 'p2p' as the type and a clear description of the main change: disabling BasicConnMgr peer cap on seed nodes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
State Consistency ✅ Passed PR contains no blockchain state modifications—only P2P network configuration and resource management changes. Check is inapplicable to infrastructure code.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chore/KLC-2416-seed-null-connmgr

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts libp2p connection-manager behavior for the seednode binary so seeds are no longer constrained by libp2p’s default BasicConnMgr(160,192) peer trimming, allowing seeds to accept more peers as resources permit.

Changes:

  • Add an explicit IsSeedNode flag to ArgsNetworkMessenger and, when set, wire libp2p.ConnectionManager(connmgr.NullConnMgr{}).
  • Set IsSeedNode: true in cmd/seednode/main.go so only the seednode binary opts into the behavior.
  • Add unit tests to assert the connection manager selection for seed vs. non-seed, plus a test-only host accessor.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
network/p2p/libp2p/netMessenger.go Adds IsSeedNode arg and conditionally installs NullConnMgr to disable default peer trimming.
cmd/seednode/main.go Sets IsSeedNode: true for the seednode binary.
network/p2p/libp2p/netMessenger_test.go Adds tests verifying NullConnMgr is used only when IsSeedNode is enabled.
network/p2p/libp2p/export_test.go Adds a test-only Host() accessor to inspect the underlying host’s ConnManager().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

coderabbitai[bot]
coderabbitai Bot previously approved these changes May 20, 2026
…2416]

The original NullConnMgr fix lifted BasicConnMgr's 160/192 watermarks but
production testing on a 4 GiB seed host showed peer count remained pinned
at ~187 with new dials failing during the security handshake ("failed to
negotiate security protocol: EOF"). Root cause: libp2p's
DefaultResourceManager auto-scales from host memory and on a 4 GiB host
caps System.Conns at exactly 187 (128 + 128*(MiB/1024)). ConnMgr trims
existing connections; ResourceManager rejects new ones, so both layers
matter.

Add a P2pConfig.ResourceManager YAML section with four strategies:
  ""        same as "default" — libp2p auto-scale based on actual host RAM
  "default" libp2p auto-scaled DefaultResourceManager
  "null"    disable libp2p resource limits (bounded only by OS)
  "scaled"  auto-scale as if host had scaledMemoryMiB MiB
            (lifts caps on small hosts without going fully unbounded)

Wire libp2p.ResourceManager() in NewNetworkMessenger via the new
buildResourceManagerOption helper. The helper also returns an io.Closer
for the "scaled" path so the freshly-constructed rcmgr is cleaned up if
libp2p.New fails before the host can take ownership.

config/seednode/config.yaml ships with strategy: "null" so seeds get the
fix from config rather than binary-side logic. config/node/config.yaml
documents the section but leaves it commented (regular nodes keep libp2p
default unless an operator opts in).

Bridge five libp2p subsystem loggers (rcmgr, swarm2, connmgr, net/identify,
autonat) into klever-go's external/<name> namespace so operators can
diagnose connection-acceptance issues via --log-level external/rcmgr:TRACE.

Add 8 wiring tests covering all four strategies plus error paths
(invalid strategy; scaled with zero memory). Existing ConnMgr tests
unchanged.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@network/p2p/libp2p/netMessenger.go`:
- Around line 1270-1275: The getFDLimit function currently casts rLimit.Cur
(uint64) straight to int which can overflow when RLIMIT_NOFILE == RLIM_INFINITY;
update getFDLimit to detect unlimited or out-of-range values by checking if
rLimit.Cur == syscall.RLIM_INFINITY or rLimit.Cur > uint64(math.MaxInt) and in
that case return a safe large int (e.g., math.MaxInt) instead of performing the
direct cast; otherwise cast rLimit.Cur to int as before. Reference symbols:
getFDLimit, rLimit, syscall.RLIMIT_NOFILE, syscall.RLIM_INFINITY, and the cast
of rLimit.Cur.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3a0ef898-43ca-4087-a4d9-974a3372d0af

📥 Commits

Reviewing files that changed from the base of the PR and between 7dd73bf and d8e5f01.

📒 Files selected for processing (5)
  • config/node/config.yaml
  • config/p2p.go
  • config/seednode/config.yaml
  • network/p2p/libp2p/netMessenger.go
  • network/p2p/libp2p/netMessenger_test.go
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: test
  • GitHub Check: Analyze (go)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.go

📄 CodeRabbit inference engine (Custom checks)

**/*.go: Verify that any new or modified concurrent code (goroutines, channels, mutexes, sync primitives) is free of race conditions. Check for: proper lock/unlock pairing, no goroutine leaks, correct channel lifecycle management, and proper context cancellation propagation.
Verify that errors are not silently discarded. Check for: unchecked error returns, error wrapping with context, proper error propagation up the call chain, and no bare panic() calls outside of init() functions.

Files:

  • config/p2p.go
  • network/p2p/libp2p/netMessenger.go
  • network/p2p/libp2p/netMessenger_test.go
network/**

⚙️ CodeRabbit configuration file

network/**: Peer-to-peer networking layer. - Check for proper input validation on all received messages - Verify rate limiting and DoS protection mechanisms - Ensure connection handling is goroutine-safe - Look for potential message amplification attacks - Verify TLS/authentication on peer connections

Files:

  • network/p2p/libp2p/netMessenger.go
  • network/p2p/libp2p/netMessenger_test.go
**/*_test.go

⚙️ CodeRabbit configuration file

**/*_test.go: Test files. Review for: - Adequate coverage of edge cases and error paths - Proper use of test helpers and assertions - Race condition coverage (tests should use -race flag patterns) - No hardcoded sleep for synchronization (use channels or sync primitives) - Test isolation (no shared mutable state between tests)

Files:

  • network/p2p/libp2p/netMessenger_test.go
🧠 Learnings (1)
📚 Learning: 2026-04-21T20:12:22.959Z
Learnt from: phcarneirobc
Repo: klever-io/klever-go PR: 38
File: indexer/eventsProcessor.go:188-211
Timestamp: 2026-04-21T20:12:22.959Z
Learning: In Go structs that are JSON-marshaled, if a field is a `bool` and has the `json:"...,omitempty"` tag, then leaving that field at its zero value (`false`) is functionally equivalent (in the resulting JSON) to explicitly setting `Foundation: false`. Reviewers should not flag struct literals that omit such `bool` fields as an inconsistency; they will serialize identically because `omitempty` suppresses `false` values.

Applied to files:

  • config/p2p.go
  • network/p2p/libp2p/netMessenger.go
  • network/p2p/libp2p/netMessenger_test.go

Comment thread network/p2p/libp2p/netMessenger.go
- Reduce NewNetworkMessenger cognitive complexity (16 → below 15) by
  extracting the broadcast multiaddr setup into buildAddressFactory.
- Fix gosec G115 in getFDLimit by clamping uint64 → int with math.MaxInt.
- Add unit and parity tests for buildAddressFactory.
@fbsobreira fbsobreira merged commit a4ba64e into develop May 20, 2026
11 of 12 checks passed
@fbsobreira fbsobreira deleted the chore/KLC-2416-seed-null-connmgr branch May 20, 2026 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants