-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
My mainnet validator node (v101.1.2) has experienced bouts of peer connection failures this evening. The logs show lots of connection failures/errors coming from CometBFT p2p module. The node was unable to keep up, missing a high proportion of blocks for some time. Checking the system load showed the namadan node process sustaining 100% use of a single core.
The host system internet is via VPN and cannot receive incoming connection requests. During the incident the host system otherwise retained normal internet connectivity via the VPN.
Checking an explorer for mainnet showed a few other validators missing a high number of blocks at the time.
Restarting the node service and the host's network interface did not appear to help. There are several service restarts in the attached logs, please note the systemd[1]: namadad.service... messages to identify the different service instances of namadad.service.
Full log:
bodski-mainnet-2025-10-05T20_20_00Z.log
Created with: journalctl -u namadad.service --since 2025-10-05T20:20Z --utc --no-hostname -o short-iso-precise
CometBFT logging config: CMT_LOG_LEVEL=*:info (includes p2p and pex entries).
From the logs everything appears normal up execution of block height=3802320:
2025-10-05T20:23:24.302006+00:00 namadan[1421207]: I[2025-10-05|21:23:24.301] Timed out module=consensus dur=3s height=3802320 round=0 step=RoundStepPropose
2025-10-05T20:23:24.330446+00:00 namadan[1421207]: I[2025-10-05|21:23:24.330] executed block module=state height=3802320 num_valid_txs=0 num_invalid_txs=0
The next log entries relating to block height=3802320 are delayed somewhat:
2025-10-05T20:23:42.390481+00:00 namadan[1421207]: I[2025-10-05|21:23:42.390] Ensure peers module=pex numOutPeers=16 numInPeers=0 numDialing=0 numToDial=-6
2025-10-05T20:23:54.713226+00:00 namadan[1421163]: 2025-10-05T20:23:54.713150Z INFO namada_node::shell: Committed block hash: 142562317ab701d0e5c426c0776e814e57f2f5a3db35f3e5c024f4f41ed28e1b, height: 3802320
2025-10-05T20:23:54.753721+00:00 namadan[1421207]: I[2025-10-05|21:23:54.753] committed state module=state height=3802320 num_txs=0 app_hash=142562317AB701D0E5C426C0776E814E57F2F5A3DB35F3E5C024F4F41ED28E1B
2025-10-05T20:23:54.882773+00:00 namadan[1421207]: I[2025-10-05|21:23:54.882] indexed block events module=txindex height=3802320
Following that for height=3802321 it complains about a commit for an unknown block:
2025-10-05T20:23:54.884157+00:00 namadan[1421207]: I[2025-10-05|21:23:54.883] Timed out module=consensus dur=-26.700205485s height=3802321 round=0 step=RoundStepNewHeight
2025-10-05T20:23:55.227169+00:00 namadan[1421207]: I[2025-10-05|21:23:55.227] commit is for a block we do not know about; set ProposalBlock=nil module=consensus height=3802321 commit_round=0 proposal= commit=48536E401179FFC967A0E9C3FD52E682EA20D3055B426ACF4BF7E65B26180CC6
2025-10-05T20:23:55.459375+00:00 namadan[1421207]: I[2025-10-05|21:23:55.459] received complete proposal block module=consensus height=3802321 hash=48536E401179FFC967A0E9C3FD52E682EA20D3055B426ACF4BF7E65B26180CC6
2025-10-05T20:23:55.474917+00:00 namadan[1421207]: I[2025-10-05|21:23:55.474] finalizing commit of block module=consensus height=3802321 hash=48536E401179FFC967A0E9C3FD52E682EA20D3055B426ACF4BF7E65B26180CC6 root=142562317AB701D0E5C426C0776E814E57F2F5A3DB35F3E5C024F4F41ED28E1B num_txs=0
2025-10-05T20:23:55.507403+00:00 namadan[1421163]: 2025-10-05T20:23:55.507279Z INFO namada_node::shell::process_proposal: Received block proposal proposer="64F0B257817C936B5053EECA8125C8A18E979643" height=3802321 hash="48536E401179FFC967A0E9C3FD52E682EA20D3055B426ACF4BF7E65B26180CC6" n_txs=0
2025-10-05T20:23:55.508505+00:00 namadan[1421163]: 2025-10-05T20:23:55.508451Z INFO namada_node::shell::finalize_block: Block height: 3802321, epoch: 1223, is new epoch: false, is masp new epoch: false.
2025-10-05T20:23:57.464793+00:00 namadan[1421163]: 2025-10-05T20:23:57.464718Z INFO namada_node::shell::finalize_block: Applied 0 transactions. Wrappers: 0, successful inner txs: 0, rejected inner txs: 0, errored inner txs: 0, unrun txs: 0, valid txs discarded by failing atomic batch: 0, vp cache size: 2 - 2, tx c>
2025-10-05T20:23:57.464793+00:00 namadan[1421163]: 2025-10-05T20:23:57.464747Z INFO namada_node::shell::finalize_block: txs executed: 0
2025-10-05T20:23:57.505512+00:00 namadan[1421207]: I[2025-10-05|21:23:57.505] executed block module=state height=3802321 num_valid_txs=0 num_invalid_txs=0
And then connection failures appear:
2025-10-05T20:23:57.886845+00:00 namadan[1421207]: I[2025-10-05|21:23:57.886] Timed out module=consensus dur=3s height=3802321 round=0 step=RoundStepPropose
2025-10-05T20:24:12.390030+00:00 namadan[1421207]: I[2025-10-05|21:24:12.389] Ensure peers module=pex numOutPeers=16 numInPeers=0 numDialing=0 numToDial=-6
2025-10-05T20:24:42.389660+00:00 namadan[1421207]: I[2025-10-05|21:24:42.389] Saving AddrBook to file module=p2p book=/home/bod/.local/share/namada/namada.5f5de2dd1b88cba30586420/cometbft/config/addrbook.json size=1002
2025-10-05T20:24:42.390736+00:00 namadan[1421207]: I[2025-10-05|21:24:42.390] Ensure peers module=pex numOutPeers=16 numInPeers=0 numDialing=0 numToDial=-6
2025-10-05T20:24:46.607117+00:00 namadan[1421207]: E[2025-10-05|21:24:46.607] Connection failed @ sendRoutine module=p2p peer=c8f1e8090686493e947c84e26ee124c6b1d855b0@148.113.9.92:26655 conn=MConn{148.113.9.92:26655} err="pong timeout"
2025-10-05T20:24:46.607117+00:00 namadan[1421207]: I[2025-10-05|21:24:46.607] service stop module=p2p peer=c8f1e8090686493e947c84e26ee124c6b1d855b0@148.113.9.92:26655 msg="Stopping MConnection service" impl=MConn{148.113.9.92:26655}
2025-10-05T20:24:46.607651+00:00 namadan[1421207]: E[2025-10-05|21:24:46.607] Stopping peer for error module=p2p peer="Peer{MConn{148.113.9.92:26655} c8f1e8090686493e947c84e26ee124c6b1d855b0 out}" err="pong timeout"
2025-10-05T20:24:46.607651+00:00 namadan[1421207]: I[2025-10-05|21:24:46.607] service stop module=p2p peer=c8f1e8090686493e947c84e26ee124c6b1d855b0@148.113.9.92:26655 msg="Stopping Peer service" impl="Peer{MConn{148.113.9.92:26655} c8f1e8090686493e947c84e26ee124c6b1d855b0 out}"
2025-10-05T20:24:46.607651+00:00 namadan[1421207]: I[2025-10-05|21:24:46.607] Reconnecting to peer module=p2p addr=c8f1e8090686493e947c84e26ee124c6b1d855b0@148.113.9.92:26655
...etc
Something weird I noticed when restarting namadad.service, connection attempts to invalid peer addresses:
2025-10-05T20:39:47.894452+00:00 namadan[1458379]: E[2025-10-05|21:39:47.894] Can't add peer's address to addrbook module=p2p err="Cannot add non-routable address 1da6332d286daf03f0d1084d192d162c4876fa2f@192.168.0.1:26656"
2025-10-05T20:39:47.894674+00:00 namadan[1458379]: E[2025-10-05|21:39:47.894] Can't add peer's address to addrbook module=p2p err="Cannot add invalid address c1410f11db5522e176e69100816ea5bbe8c99e36@0.0.0.0:26670: invalid IP"
2025-10-05T20:39:47.894810+00:00 namadan[1458379]: E[2025-10-05|21:39:47.894] Can't add peer's address to addrbook module=p2p err="Cannot add non-routable address df12e4cd7db11e8c5a9369f743f65ef513e47a01@192.168.1.1:26656"
2025-10-05T20:39:47.894959+00:00 namadan[1458379]: E[2025-10-05|21:39:47.894] Can't add peer's address to addrbook module=p2p err="Cannot add non-routable address c229090919ef31b296700e036a6599d712d4aeba@127.0.0.1:26656"
2025-10-05T20:39:47.895052+00:00 namadan[1458379]: E[2025-10-05|21:39:47.894] Can't add peer's address to addrbook module=p2p err="Cannot add non-routable address 6dafbd32f6ff6bf0c53c248c430f509cd8451fda@127.0.0.1:26656"
2025-10-05T20:39:47.895124+00:00 namadan[1458379]: E[2025-10-05|21:39:47.895] Can't add peer's address to addrbook module=p2p err="Cannot add non-routable address e9f0b4ab007ecc1d7b3eab2366e49f36b9635532@127.0.0.1:26656"
2025-10-05T20:39:47.895196+00:00 namadan[1458379]: E[2025-10-05|21:39:47.895] Can't add peer's address to addrbook module=p2p err="Cannot add non-routable address f72fb17373b6e5666a9a8e8ae89bcfdb5bee36d8@127.0.0.1:26656"
2025-10-05T20:39:47.895196+00:00 namadan[1458379]: E[2025-10-05|21:39:47.895] Can't add peer's address to addrbook module=p2p err="Cannot add non-routable address 91539c9b4730eaaa0998f5ca48964208074e705e@127.0.0.1:26656"
2025-10-05T20:39:47.895347+00:00 namadan[1458379]: E[2025-10-05|21:39:47.895] Can't add peer's address to addrbook module=p2p err="Cannot add non-routable address 3eb52b18e1ccfd787d558ff8a1444b39ca57575e@127.0.0.1:26656"
And also entries like this where we're rejecting inbound peer connections from localhost (???):
2025-10-05T20:39:48.723910+00:00 namadan[1458379]: I[2025-10-05|21:39:48.723] Inbound Peer rejected module=p2p err="filtered CONN<127.0.0.1:49830>: duplicate CONN<127.0.0.1:49830>" numPeers=3
Eventually, the node recovered by being left to just run. It's hard to pinpoint where recovery began, being a gradual process and p2p/pex errors seem to be normal at node startup when comet p2p logging is enabled.
There is panic in the logs that occurs in tower_abci::v037::server::Server::listen_tcp during node shutdown. I've seen this before and suspect it's an unrelated bug triggered by unlucky timing.