Skip to content

Network Map not updating after loss of HA routing peer #3886

Open
@drewhemm

Description

@drewhemm

Describe the problem

I am running multiple routing peers in HA for a network. When I connect a client peer to Netbird, everything works fine. If I netbird down the routing peer that the client is using for the routes, it does not receive any further network map updates. Since a network map update doesn't come through, the client peer does not switch the routes over to the other routing peer.

To Reproduce

Steps to reproduce the behavior:

  1. Deploy two (or more) routing peers and connect them to an instance of Netbird
  2. Configure one or more networks to be accessible via those peers, individually or a via a group
  3. Connect a client peer to Netbird. Ensure it has an access policy that causes it to receive the route(s)
  4. Run netbird down on the routing peer that the client is currently using for the route(s)
  5. Run netbird status -d on the client peer. You will see the status of the down routing peer as 'Connecting' and the routes remain associated with that peer and do not failover to the other peer, despite status of 'Connected'

At this point, the only workarounds are to restart the client peer, or to take action in the management UI that triggers the sending of a network map update, such as toggling the routing peer or peer group off and on.

Expected behavior

The networks/routes should fail over to the remaining routing peer(s) within a few seconds and without requiring any manual action

Are you using NetBird Cloud?

Self-hosted

NetBird version

0.45.1

Is any other VPN software installed?

No

Debug output

To help us resolve the problem, please attach the following anonymized status output

netbird status -dA

Peers detail:
 rpi4b-window.netbird.selfhosted:
  NetBird IP: 100.71.117.141
  Public key: wAKpG4Ol+aSzF9wAlhFwFYLxrIwQT3qKSndxVVUscEE=
  Status: Connecting
  -- detail --
  Connection type:
  ICE candidate (Local/Remote): -/-
  ICE candidate endpoints (Local/Remote): -/-
  Relay server address:
  Last connection update: 8 minutes, 6 seconds ago
  Last WireGuard handshake: -
  Transfer status (received/sent) 0 B/0 B
  Quantum resistance: false
  Networks: -
  Latency: 0s

 netbird-dev-optimiser-k8s-78dfdb66c6-zfkk6.netbird.selfhosted:
  NetBird IP: 100.71.177.114
  Public key: RwHmJKTVPU3/P0d5tXPmWww9qi4lgmAldJ7Wf3XgKDM=
  Status: Connected
  -- detail --
  Connection type: Relayed
  ICE candidate (Local/Remote): srflx/relay
  ICE candidate endpoints (Local/Remote): 198.51.100.0:39513/198.51.100.1:56945
  Relay server address:
  Last connection update: 8 minutes, 3 seconds ago
  Last WireGuard handshake: 1 minute, 36 seconds ago
  Transfer status (received/sent) 688 B/1.5 KiB
  Quantum resistance: false
  Networks: 10.243.0.0/22, 10.243.4.0/22
  Latency: 52.9895ms

 netbird-a2sdv-1-1.netbird.selfhosted:
  NetBird IP: 100.71.188.65
  Public key: glJyPm+D1gLQYtABng2oGZCyQqLF5QRBrMripmcmYTg=
  Status: Connected
  -- detail --
  Connection type: Relayed
  ICE candidate (Local/Remote): relay/srflx
  ICE candidate endpoints (Local/Remote): 198.51.100.1:57256/198.51.100.2:53772
  Relay server address:
  Last connection update: 8 minutes, 4 seconds ago
  Last WireGuard handshake: 1 minute, 54 seconds ago
  Transfer status (received/sent) 4.9 KiB/11.1 KiB
  Quantum resistance: false
  Networks: 10.7.0.0/23, 10.7.3.0/28, 192.168.7.0/24, 198.51.100.3/32
  Latency: 60.0205ms

 netbird-rpi5-1.netbird.selfhosted:
  NetBird IP: 100.71.246.55
  Public key: ZsqyMXTm0tRK3JCztt+lw51dnqo1BLZ6yKhZOMblj3E=
  Status: Connected
  -- detail --
  Connection type: Relayed
  ICE candidate (Local/Remote): relay/srflx
  ICE candidate endpoints (Local/Remote): 198.51.100.1:52976/198.51.100.2:47204
  Relay server address:
  Last connection update: 3 minutes, 55 seconds ago
  Last WireGuard handshake: 1 minute, 54 seconds ago
  Transfer status (received/sent) 9.9 KiB/5.7 KiB
  Quantum resistance: false
  Networks: -
  Latency: 37.9216ms

Events:
  [WARNING] DNS (0e7e2d68-97e5-4d71-8443-a5a172192813)
    Message: All upstream servers failed (probe failed)
    Time: 8 minutes, 6 seconds ago
    Metadata: upstreams: 10.243.4.10:53
  [WARNING] DNS (1a1f6725-b683-4eb4-8d9b-114529713fbe)
    Message: All upstream servers failed (probe failed)
    Time: 8 minutes, 6 seconds ago
    Metadata: upstreams: 10.243.4.10:53
  [WARNING] DNS (4b5b2beb-88ee-4996-9858-c4edab27e051)
    Message: All upstream servers failed (probe failed)
    Time: 8 minutes, 6 seconds ago
    Metadata: upstreams: 10.7.0.1:53
  [INFO] SYSTEM (da0c16a9-63a7-4089-8a5c-f8ea6e0d7fa6)
    Message: Network map updated
    Time: 8 minutes, 6 seconds ago
  [INFO] SYSTEM (e222a455-5dd1-4795-b756-e19e6d79f8ad)
    Message: Network map updated
    Time: 5 minutes, 45 seconds ago
  [INFO] SYSTEM (093620b4-b95c-49cf-9504-ddb29c1a2e2f)
    Message: Network map updated
    Time: 5 minutes, 16 seconds ago
  [INFO] SYSTEM (348b01e5-0dd8-4261-aa74-5da2b1376d1a)
    Message: Network map updated
    Time: 5 minutes, 15 seconds ago
  [INFO] SYSTEM (8bb8a0fb-6bdf-47c4-b797-ac6be1f90b89)
    Message: Network map updated
    Time: 4 minutes, 9 seconds ago
  [INFO] SYSTEM (3aab7764-894f-4b68-a293-9cf2703303b5)
    Message: Network map updated
    Time: 4 minutes, 8 seconds ago
  [INFO] SYSTEM (fc89477f-e309-4937-8a66-0bad56d48db1)
    Message: Network map updated
    Time: 4 minutes, 8 seconds ago
OS: windows/amd64
Daemon version: 0.45.1
CLI version: 0.45.1
Management: Connected to https://nb.anon-zkiwj.domain:33073/
Signal: Connected to http://nb.anon-zkiwj.domain:10000/
Relays:
  [stun:nb.anon-ZkiWj.domain:3478] is Available
  [turn:nb.anon-ZkiWj.domain:3478?transport=udp] is Available
Nameservers:
  [10.243.4.10:53] for [argocd.svc.dev.gcp.anon-Hjt1a.domain, jupyter.svc.dev.gcp.anon-Hjt1a.domain] is Available
  [8.8.8.8:53, 8.8.4.4:53] for [.] is Available
  [10.7.0.1:53] for [office.anon-ZkiWj.domain] is Available
  [10.7.0.1:53] for [k3s-devel.anon-ZkiWj.domain] is Available
FQDN: zbduo8406.netbird.selfhosted
NetBird IP: 100.71.137.187/16
Interface type: Userspace
Quantum resistance: false
Lazy connection: false
Networks: -
Forwarding rules: 0
Peers count: 3/4 Connected

Create and upload a debug bundle, and share the returned file key:

netbird debug for 1m -AS -U

Uploaded files are automatically deleted after 30 days.

2c083968ec611b79db72ce4a8f4aae94746840c955f0ae82ae55b08e0a33a96b/abadb810-239a-4d67-b9bd-17e3c8181cf8

Alternatively, create the file only and attach it here manually:

netbird debug for 1m -AS

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

Have you tried these troubleshooting steps?

  • Reviewed client troubleshooting (if applicable)
  • Checked for newer NetBird versions
  • Searched for similar issues on GitHub (including closed ones)
  • Restarted the NetBird client
  • Disabled other VPN software
  • Checked firewall settings

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions