Open
Description
What version of nebula
are you using?
1.7.2
What operating system are you using?
Linux
Describe the Bug
- Lighthouse: with static public ip
- Host: behind a NAT whose public ip may change
- bug happens excatly after the NAT public ip changed: the nebula host's reconnection fails
Maybe there should be a counter or pivot which will reload service when tries fail.
Logs from affected hosts
Jun 03 08:17:10 N1 nebula[279798]: level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=2265257244 localIndex=2265257244 remoteIndex=0 udpAddrs="[*.*.*.*:4242]" vpnIp=192.168.100.1
Jun 03 08:17:17 N1 nebula[279798]: level=info msg="Handshake timed out" durationNs=6891732272 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=2265257244 localIndex=2265257244 remoteIndex=0 udpAddrs="[*.*.*.*:4242]" vpnIp=192.168.100.1
Jun 03 08:18:10 N1 nebula[279798]: level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=789404569 localIndex=789404569 remoteIndex=0 udpAddrs="[*.*.*.*:4242]" vpnIp=192.168.100.1
Jun 03 08:18:17 N1 nebula[279798]: level=info msg="Handshake timed out" durationNs=6688333707 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=789404569 localIndex=789404569 remoteIndex=0 udpAddrs="[*.*.*.*:4242]" vpnIp=192.168.100.1
# ============================
# a lot of same logs here
# ============================
Jun 03 09:32:52 N1 nebula[279798]: level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=215861812 localIndex=215861812 remoteIndex=0 udpAddrs="[*.*.*.*:4242]" vpnIp=192.168.100.1
Jun 03 09:32:59 N1 nebula[279798]: level=info msg="Handshake timed out" durationNs=6664379861 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=215861812 localIndex=215861812 remoteIndex=0 udpAddrs="[*.*.*.*:4242]" vpnIp=192.168.100.1
Jun 03 09:32:59 N1 nebula[279798]: level=info msg="Handshake timed out" durationNs=6962423141 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=2959928072 localIndex=2959928072 remoteIndex=0 udpAddrs="[]" vpnIp=192.168.100.10
# ============================
# manually restart neblua host
# ============================
Jun 03 09:33:00 N1 nebula[279798]: level=info msg="Caught signal, shutting down" signal=terminated
Jun 03 09:33:00 N1 nebula[279798]: level=info msg=Goodbye
Jun 03 09:33:00 N1 systemd[1]: Stopping Nebula overlay networking tool...
Jun 03 09:33:00 N1 systemd[1]: nebula.service: Succeeded.
Jun 03 09:33:00 N1 systemd[1]: Stopped Nebula overlay networking tool.
Jun 03 09:33:00 N1 systemd[1]: nebula.service: Consumed 30min 44.211s CPU time.
Jun 03 09:33:00 N1 systemd[1]: Started Nebula overlay networking tool.
Jun 03 09:33:00 N1 nebula[299280]: level=info msg="Firewall rule added" firewallRule="map[caName: caSha: direction:outgoing endPort:0 groups:[] host:any ip: localIp: proto:0 startPort:0]"
Jun 03 09:33:00 N1 nebula[299280]: level=info msg="Firewall rule added" firewallRule="map[caName: caSha: direction:incoming endPort:0 groups:[] host:any ip: localIp: proto:0 startPort:0]"
Jun 03 09:33:00 N1 nebula[299280]: level=info msg="Firewall started" firewallHash=498215dec4e5687a2353f51c10838c113bd1af35ef72b8e8c9f536986ada5417
Jun 03 09:33:00 N1 nebula[299280]: level=info msg="Main HostMap created" network=192.168.100.2/24 preferredRanges="[]"
Jun 03 09:33:00 N1 nebula[299280]: level=info msg="punchy enabled"
Jun 03 09:33:00 N1 nebula[299280]: level=info msg="Loaded send_recv_error config" sendRecvError=always
Jun 03 09:33:00 N1 nebula[299280]: level=info msg="Nebula interface is active" boringcrypto=false build=1.7.2 interface=tun0 network=192.168.100.2/24 udpAddr="0.0.0.0:44710"
Jun 03 09:33:00 N1 nebula[299280]: level=info msg="DNS results changed for host list" newSet="map[*.*.*.*:4242:{}]" origSet="&map[]"
# ============================
# now it's back to normal
# ============================
Jun 03 09:33:00 N1 nebula[299280]: level=info msg="Handshake message sent" handshake="map[stage:1 style:ix_psk0]" initiatorIndex=1175697647 localIndex=1175697647 remoteIndex=0 udpAddrs="[*.*.*.*:4242]" vpnIp=192.168.100.1
Jun 03 09:33:00 N1 nebula[299280]: level=info msg="Handshake message received" certName=ICL durationNs=327741706 fingerprint=a01937d6e07d050ba2cfc91fd2f56ec3f008b33690b7931f3a5bfe99f835f67a handshake="map[stage:2 style:ix_psk0]" initiatorIndex=1175697647 issuer=33768094d6855b7ca53962932dd41ce99b11347d220ff89a33d1f01f0f5ab578 remoteIndex=1175697647 responderIndex=3925596160 sentCachedPackets=1 udpAddr="*.*.*.*:4242" vpnIp=192.168.100.1
Jun 03 09:33:03 N1 nebula[299280]: level=info msg="Handshake message received" certName=Macbook fingerprint=bd3d7b77768b32aa25b5ce82c2cc67a4620b78aaf1ed95999c3e93016c8795f5 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=2451724569 issuer=33768094d6855b7ca53962932dd41ce99b11347d220ff89a33d1f01f0f5ab578 remoteIndex=0 responderIndex=0 udpAddr="192.168.123.10:61939" vpnIp=192.168.100.10
Jun 03 09:33:03 N1 nebula[299280]: level=info msg="Handshake message sent" certName=Macbook fingerprint=bd3d7b77768b32aa25b5ce82c2cc67a4620b78aaf1ed95999c3e93016c8795f5 handshake="map[stage:2 style:ix_psk0]" initiatorIndex=2451724569 issuer=33768094d6855b7ca53962932dd41ce99b11347d220ff89a33d1f01f0f5ab578 remoteIndex=0 responderIndex=3070221539 sentCachedPackets=0 udpAddr="192.168.123.10:61939" vpnIp=192.168.100.10
Jun 03 09:33:03 N1 nebula[299280]: level=info msg="Handshake message received" certName=Macbook fingerprint=bd3d7b77768b32aa25b5ce82c2cc67a4620b78aaf1ed95999c3e93016c8795f5 handshake="map[stage:1 style:ix_psk0]" initiatorIndex=2451724569 issuer=33768094d6855b7ca53962932dd41ce99b11347d220ff89a33d1f01f0f5ab578 remoteIndex=0 responderIndex=0 udpAddr="172.16.16.10:61939" vpnIp=192.168.100.10
Jun 03 09:33:03 N1 nebula[299280]: level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" udpAddr="172.16.16.10:61939" vpnIp=192.168.100.10
issuer=33768094d6855b7ca53962932dd41ce99b11347d220ff89a33d1f01f0f5ab578 remoteIndex=0 responderIndex=0 udpAddr="172.16.16.11:56979" vpnIp=192.168.100.11
issuer=33768094d6855b7ca53962932dd41ce99b11347d220ff89a33d1f01f0f5ab578 remoteIndex=0 responderIndex=0 udpAddr="192.168.123.11:56979" vpnIp=192.168.100.11
Jun 03 09:40:23 N1 nebula[299280]: level=info msg="Handshake message sent" cached=true handshake="map[stage:2 style:ix_psk0]" udpAddr="192.168.123.11:56979" vpnIp=192.168.100.11
Config files from affected hosts
pki:
ca: /root/bin/nebula/cert/ca.crt
cert: /root/bin/nebula/cert/SY.crt
key: /root/bin/nebula/cert/SY.key
static_host_map:
"192.168.100.1": ["example.com:4242"] # hidden
lighthouse:
am_lighthouse: false
interval: 60
hosts:
- "192.168.100.1"
listen:
host: 0.0.0.0
port: 0
punchy:
punch: true
respond: true
delay: 1s
respond_delay: 5s
cipher: aes
tun:
disabled: false
tx_queue: 500
mtu: 1300
# Unsafe routes allows you to route traffic over nebula to non-nebula nodes
# Unsafe routes should be avoided unless you have hosts/services that cannot run nebula
# NOTE: The nebula certificate of the "via" node *MUST* have the "route" defined as a subnet in its certificate
# `mtu`: will default to tun mtu if this option is not specified
# `metric`: will default to 0 if this option is not specified
# `install`: will default to true, controls whether this route is installed in the systems routing table.
# unsafe_routes:
# - route: 192.168.1.0/24
# via: 192.168.100.1
# mtu: 1300
# install: true
logging:
level: info
format: text
disable_timestamp: true
firewall:
outbound_action: drop
inbound_action: drop
conntrack:
tcp_timeout: 12m
udp_timeout: 3m
default_timeout: 10m
# The firewall is default deny. There is no way to write a deny rule.
# Rules are comprised of a protocol, port, and one or more of host, group, or CIDR
# Logical evaluation is roughly: port AND proto AND (ca_sha OR ca_name) AND (host OR group OR groups OR cidr)
# - port: Takes `0` or `any` as any, a single number `80`, a range `200-901`, or `fragment` to match second and further fragments of fragmented packets (since there is no port available).
# code: same as port but makes more sense when talking about ICMP, TODO: this is not currently implemented in a way that works, use `any`
# proto: `any`, `tcp`, `udp`, or `icmp`
# host: `any` or a literal hostname, ie `test-host`
# group: `any` or a literal group name, ie `default-group`
# groups: Same as group but accepts a list of values. Multiple values are AND'd together and a certificate would have to contain all groups to pass
# cidr: a remote CIDR, `0.0.0.0/0` is any.
# local_cidr: a local CIDR, `0.0.0.0/0` is any. This could be used to filter destinations when using unsafe_routes.
# ca_name: An issuing CA name
# ca_sha: An issuing CA shasum
outbound:
- port: any
proto: any
host: any
inbound:
- port: any
proto: any
host: any