Terse technical reference of confirmed gotchas with third-party libraries. Each entry documents a specific failure mode, its root cause, and the fix or workaround. For deeper explanations with full context and debugging narratives, see KNOWLEDGE_BASE.md.
The common thread with js-libp2p packages is silent failure — misconfigurations produce no errors, just broken behavior. Start here when debugging connectivity or DHT problems.
Node never connects to bootstrap peer — no error.
If you pass a multiaddr like /ip4/127.0.0.1/tcp/9001/ws without the trailing /p2p/<PeerId>, the bootstrap module silently ignores it. No error, no warning — the node just never connects to the bootstrap peer.
- Workaround: Always include the full multiaddr with PeerId. The relay's HTTP info endpoint serves complete multiaddrs.
- Discovered: epic-002, commit
749ad76
DHT routing table stays empty forever — provide() and findProviders() hang or time out.
The bootstrap peer discovery module dials peers during createLibp2p() startup, while services are still initializing. The identify protocol can complete before kadDHT registers its topology listener with the registrar. The registrar's _onPeerIdentify handler only notifies topologies that are already registered — there is no catch-up mechanism for peers identified before a topology is added. Result: kadDHT never learns about the bootstrap peer's /ipfs/kad/1.0.0 support, the routing table stays at size 0, and all DHT operations fail.
This race was latent in epic-003 (with fewer services, kadDHT usually registered in time). Adding GossipSub in epic-004 made startup slower and the race reliably triggers.
- Workaround: Don't use the
bootstrapmodule. Instead, callnode.dial(multiaddr(addr))manually aftercreateLibp2p()returns. At that point all services and topologies are registered, sopeer:identifyevents are processed correctly. - Discovered: epic-004
Default browser gater silently blocks ws:// and private IPs.
The browser build of libp2p (connection-gater.browser.js) ships a default connection gater that rejects insecure WebSocket (ws://) and RFC-1918 private addresses (127.0.0.1, 192.168.x.x, etc.). In local dev, both apply. You get a DialDeniedError but only if you're already looking for it — otherwise the connection just never happens.
- Workaround: Override with
denyDialMultiaddr: () => falseincreateBrowserNode. Production would usewss://with public addresses. - Discovered: epic-002, commit
fd025f6
Circuit Relay v2 is a dumb pipe — no peer discovery. The relay transports bytes between peers but does not perform peer discovery, protocol negotiation, or any application-level logic on behalf of connected peers. If you connect two browsers to the same relay, they will not automatically discover each other. A separate discovery mechanism is required (DHT, pubsub, or an external endpoint).
- Workaround: Use DHT content routing (
provide/findProviders) for peer discovery. The relay is only for signaling and NAT traversal. - Discovered: epic-002, commit
fd025f6
Messages flow for ~2 minutes then silently stop. No disconnect event, no error.
circuitRelayServer() with no config applies default per-connection limits: defaultDurationLimit = 120,000 ms (2 min), defaultDataLimit = 131,072 bytes (128 KB). When either limit is hit, the relay resets the underlying STOP stream. This is a stream-level closure, not a connection-level one — libp2p does not fire peer:disconnect. GossipSub continues trying to send on a dead stream and messages silently vanish.
The defaults are buried in @libp2p/circuit-relay-v2/dist/src/constants.js and not mentioned in the README or API docs. The 2-minute default is designed for short-lived relay-assisted signaling (establish WebRTC, then disconnect from relay), not for sustained communication.
- Current state: Using the 2-min default. Now that the hangUp-before-WebRTC pattern upgrades all connections to direct WebRTC, the relay is only used for signaling. The 2-min default is sufficient. GossipSub's silent error swallowing is intercepted by
wrapGossipSubErrors()— see below. - Discovered: epic-004 debugging, updated epic-008
Errors from stream death, RPC send failures, stream creation — all invisible to application code.
GossipSub uses this.log.error() in 13 places inside .catch() or catch blocks. These errors are caught, logged to libp2p's internal debug channel (localStorage.debug = 'libp2p:gossipsub*'), and discarded. No event is emitted, no callback is called.
- Fix:
wrapGossipSubErrors()increate-browser-node.tswrapspubsub.log.errorto intercept all 13 error sites. Extracts the prefix string and dispatches to our callback. Passes through to the original logger unchanged. See the JSDoc for the full prefix reference with source locations. - Version coupling: Prefix strings are from
@chainsafe/libp2p-gossipsubv14.x. Must re-verify on upgrade. - Discovered: epic-008
node.start() fails on slow networks (cellular, high-latency WAN).
DEFAULT_RESERVATION_COMPLETION_TIMEOUT in @libp2p/circuit-relay-v2/src/constants.ts is 2000ms. The entire relay dial + WebSocket handshake + reservation protocol must complete within this window. On cellular networks with 200ms+ RTT, this is insufficient — the node crashes on startup with UnsupportedListenAddressesError.
- Fix: Pass
reservationCompletionTimeout: 15_000tocircuitRelayTransport()and settransportManager: { faultTolerance: FaultTolerance.NO_FATAL }so the node starts even if the reservation times out. The explicit relay dial after startup retries the connection. - Discovered: epic-008, phone-on-cellular test
Peers connect via relay (limited=true) even though WebRTC transport is configured.
When dial(peerId) is called, libp2p resolves the peer's addresses and tries them in order. The plain /p2p-circuit address (relay) connects faster than /p2p-circuit/webrtc (requires SDP/ICE exchange). Once the relay connection succeeds, the peer is "connected" and the WebRTC dial is skipped.
The /webrtc multiaddr path does the right thing: uses the relay only for signaling (SDP/ICE exchange via /webrtc-signaling/0.0.1), then establishes a direct WebRTC data channel. The relay is not in the data path. But you must dial the /webrtc address explicitly — libp2p will not automatically upgrade a relay connection to WebRTC.
- Fix: Filter provider multiaddrs from
findProviders()for/webrtcaddresses and dial those directly. Fall back todial(peerId)only if no/webrtcaddress is available or the WebRTC dial fails. - Discovered: epic-004 debugging
Silent failure — server node runs as client, all DHT queries time out.
Line 139 of kad-dht.js reads this.clientMode = init.clientMode ?? true, contradicting the JSDoc @default false. When the relay starts with the default, it never registers the /ipfs/kad/1.0.0 protocol handler. The browser's topology listener never detects a DHT-capable peer, the routing table stays empty, and provide() / findProviders() time out with TimeoutError: signal timed out — no indication of the root cause.
- Workaround: Explicitly set
clientMode: falseon any node that should act as a DHT server. - Discovered: epic-003
Queries fail immediately after connection because routing table is empty.
After a browser connects to a DHT server peer, the topology listener needs time to detect the server's /ipfs/kad/1.0.0 protocol support and add it to the routing table. There is no "DHT ready" event or callback. If you call provide() or findProviders() immediately after the peer:connect event, the routing table is empty and queries time out.
A blind setTimeout is unreliable — the required delay depends on how many protocols are registered (each adds to the identify exchange). Adding GossipSub in epic-004 broke a 2-second timer that was working in epic-003.
- Workaround: Listen for the
peer:identifyevent, verify the peer's protocols include/ipfs/kad/1.0.0, then allow a short delay (500ms) for the topology listener to update the routing table. This is event-driven instead of time-based. - Discovered: epic-003, updated epic-004
Operations hang forever if the routing table is empty or the dial target is unreachable.
contentRouting.provide() internally calls findClosestPeers(). With an empty routing table, the DHT query waits indefinitely for responses from non-existent peers. No timeout, no error — just a permanent hang. Similarly, findProviders() can hang if the routing table has no entries. dial() can also hang indefinitely when the target peer is unreachable — observed during WAN testing when a WebRTC dial produced no error and no success.
- Workaround: Always pass
{ signal: AbortSignal.timeout(N) }toprovide()andfindProviders(). Fordial(), we monkeypatchnode.dialincreate-browser-node.tswith a 15s default timeout viawrapDialWithTimeoutso callers don't need to remember. - Discovered: epic-004, extended epic-008
GossipSub connects to peers, but no messages are exchanged. streamsOutbound map stays empty. No error is visible.
@chainsafe/libp2p-gossipsub@14.x depends on @libp2p/interface@^2.x. libp2p@3.x uses @libp2p/interface@^3.x, which completely changed the stream interface:
- v2 streams (
@libp2p/utilsv6,AbstractStream): have.sink(async function consuming an iterable) and.source(async iterable). GossipSub'sOutboundStreamconstructor usesit-pipeto callpipe(pushable, rawStream), which callsrawStream.sink(pushable). - v3 streams (
@libp2p/utilsv7,AbstractMessageStream): removed.sinkand.sourceentirely. Replaced withSymbol.asyncIteratorand event-based.send()API.
When GossipSub receives a v3 stream from connection.newStream():
OutboundStreamconstructor callspipe(this.pushable, this.rawStream)it-pipechecksisDuplex(rawStream)→false(no.sinkproperty)rawPipetries to callrawStream(pushable)as a function → TypeErrorcreateOutboundStreamcatches it internally:catch (e) { this.log.error('createOutboundStream error', e) }— silently swallowedstreamsOutbound.set()never executes → outbound stream count stays 0- No subscription exchange → no mesh formation → no message delivery
Key source locations:
@chainsafe/libp2p-gossipsub/dist/src/index.js:482-484—new OutboundStream(await connection.newStream(...))@chainsafe/libp2p-gossipsub/dist/src/stream.js:20—pipe(this.pushable, this.rawStream).catch(errCallback)it-pipe/dist/src/index.js:47-51—isDuplexcheck:obj.sink != null && obj.source != null@libp2p/utils/dist/src/abstract-message-stream.js— v3 stream, no.sink/.source@libp2p/pubsub/node_modules/@libp2p/utils/dist/src/abstract-stream.js— v2 stream, HAS.sink/.source
Debugging was difficult because:
connection.newStream()succeeds — the stream IS created at the transport level- GossipSub's
createOutboundStreamdoesn't re-throw — it catches and logs internally - The only evidence is
streamsOutbound.size === 0aftercreateOutboundStreamreturns - Diagnostic monkey-patching
connection.newStreamshows success because the stream creation itself works; the failure is in theOutboundStreamwrapper constructor that runs afternewStreamreturns - GossipSub's debug logger (
this.log.error) is not connected to the browser console by default
What we tried before finding the root cause:
- Checked GossipSub's three early returns (
!isStarted(),!peers.has(id),streamsOutbound.has(id)) — none triggered - Checked stream limits in
findOutgoingStreamLimit— not the issue (defaults to 64) - Checked
runOnLimitedConnectionconfiguration — set correctly - Added diagnostic monkey-patch intercepting
connection.newStreamerrors — showed success, which was misleading because the error is in theOutboundStreamconstructor, not innewStream - Waited 4+ minutes with timing tests — not a timing issue
- Checked GossipSub topology registration for
/meshsub/1.1.0and/meshsub/1.0.0— fires correctly - Noted 4 parallel
createOutboundStreamcalls per peer (one per protocol topology) — all fail identically
- Fix: Pin all libp2p packages to their latest
@libp2p/interface@^2.x-compatible versions. Replace"*"wildcards. Compatible versions:libp2p@~2.10,@chainsafe/libp2p-noise@~16.1,@chainsafe/libp2p-yamux@~7.0,@libp2p/bootstrap@~11.0,@libp2p/circuit-relay-v2@~3.2,@libp2p/crypto@~5.1,@libp2p/identify@~3.0,@libp2p/kad-dht@~15.1,@libp2p/ping@~2.0,@libp2p/webrtc@~5.2,@libp2p/websockets@~9.2. - Discovered: epic-004 debugging, root-caused across multiple sessions
Messages flow through the relay instead of the direct WebRTC data channel. Killing the relay kills messaging even though webrtc limited=false is confirmed.
libp2p's registrar notifies each topology (like GossipSub) exactly once per peer via a topology.filter. Whichever connection arrives first wins. The relay (circuit) connection always establishes before WebRTC (WebRTC requires SDP/ICE exchange via the relay). With runOnLimitedConnection: true, GossipSub accepts the relay connection, the filter marks the peer as "seen", and the later WebRTC connection notification is skipped.
GossipSub's createOutboundStream also has a gate: if (this.streamsOutbound.has(id)) return — even if the topology filter didn't block it, GossipSub wouldn't replace the relay stream with a WebRTC one.
Key source locations:
-
libp2p/dist/src/registrar.js:275-278—topology.filter.has(peerId)gate -
@chainsafe/libp2p-gossipsub/dist/src/index.js:475-480—streamsOutbound.has(id)gate -
Fix: Remove
runOnLimitedConnection: truefrom the GossipSub config. With the default (false), the registrar skips limited connections before setting the filter, so WebRTC is the first connection GossipSub sees. Messages then flow directly over WebRTC, surviving relay shutdown. -
Tradeoff: Messages don't flow until WebRTC establishes (a few seconds). If WebRTC fails (symmetric NAT), messages never flow — acceptable for P2P architecture.
-
Discovered: epic-008
Messages don't flow despite WebRTC connections being established.
When a peer connects via relay (limited=true) before the WebRTC connection, GossipSub never binds to the subsequent WebRTC connection. peer:identify does not fire for the WebRTC connection despite all code paths suggesting it should. The identify service, registrar, and GossipSub topology are all coded correctly — the issue is a subtle race condition or dedup in libp2p internals.
This only manifests when peers connect via relay circuit BEFORE the WebRTC dial (e.g., after a browser refresh when other peers' DHT queries discover the new peer and dial via relay). In the initial flow (no prior relay connection), WebRTC is the first connection and GossipSub binds correctly.
- Workaround: Before dialing WebRTC,
hangUpexisting connections to that peer. This ensures WebRTC is the first (and only) connection, matching the pattern that works reliably. - Discovered: epic-008, VPN browser refresh test
TypeScript type errors when used with libp2p@3.x.
@chainsafe/libp2p-gossipsub@14.x depends on @libp2p/interface@^2.0.0, but libp2p@3.x depends on @libp2p/interface@^3.0.0. GossipSub bundles its own copy of the older interface, causing TypeScript to see incompatible types. This is a symptom of the version mismatch above — fix the versions and both the types and the runtime break go away.
- Workaround (if staying on v3): Cast through
unknown:gossipsub({...}) as unknown as ReturnType<typeof identify>. Note: this hides the runtime incompatibility described above. - Discovered: epic-004