-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Scenario:
(+300 other nodes connected to the mesh-vpn)
Description:
Issues with L3 ping from a host behind nml-wdr4300 to de:ad:ca:fe:46:1d (10.130.0.254/muehlentor).
13:30:48.216945 0e:85:8c:0f:63:fe > de:ad:ca:fe:46:1d, ethertype IPv4 (0x0800), length 98: 10.130.11.93 > 10.130.0.254: ICMP echo request, id 12263, seq 521, length 64
ICMP echo request never reached muehlentor.
Issues with the originator the frame is sent to:
Expected originator: 26:9c:57:9b:5c:b2 (muehlentor)
Got: d6:89:49:08:f6:9d (holstentor)
Might have occurred after a reboot of muehlentor.
A reboot of nml-wdr4300 fixed the issue for now.
Console Output:
Output before reboot of nml-wdr4300, during the issue.
root@nml-wdr4300:~# batctl tg | grep de:ad:ca:fe:46:1d
* de:ad:ca:fe:46:1d ( 3) via d6:89:49:08:f6:9d ( 3) (0x146d) [...]
+ de:ad:ca:fe:46:1d ( 2) via 26:9c:57:9b:5c:b2 ( 2) [...]
root@nml-wdr4300:~# batctl tg | grep 26:9c:57:9b:5c:b2
+ de:ad:ca:fe:46:1d ( 2) via 26:9c:57:9b:5c:b2 ( 2) [...]
* 26:a8:54:c9:1d:a1 ( 2) via 26:9c:57:9b:5c:b2 ( 2) (0x1f7d) [...]
root@nml-wdr4300:~# batctl tg | grep d6:89:49:08:f6:9d
* 16:d0:f3:0e:72:a5 ( 3) via d6:89:49:08:f6:9d ( 3) (0x146d) [...]
* fe:54:00:0c:bb:eb ( 3) via d6:89:49:08:f6:9d ( 3) (0x146d) [...]
* 52:54:00:0c:bb:eb ( 3) via d6:89:49:08:f6:9d ( 3) (0x146d) [...]
* de:ad:ca:fe:46:1d ( 3) via d6:89:49:08:f6:9d ( 3) (0x146d) [...]
root@nml-wdr4300:~# batctl o | grep 26:9c:57:9b:5c:b2
26:9c:57:9b:5c:b2 0.140s (198) 00:0d:b9:20:8f:05 [ br-wan]: 00:0d:b9:20:8f:05 (198)
root@nml-wdr4300:~# batctl o | grep d6:89:49:08:f6:9d
d6:89:49:08:f6:9d 0.360s (224) 00:0d:b9:20:8f:05 [ br-wan]: 00:0d:b9:20:8f:05 (224)
root@nm-alix:~# batctl tg | grep de:ad:ca:fe:46:1d
* de:ad:ca:fe:46:1d ( 2) via 26:9c:57:9b:5c:b2 ( 2) (0x1f7d) [...]
root@nm-alix:~# batctl tg | grep 26:9c:57:9b:5c:b2
* de:ad:ca:fe:46:1d ( 2) via 26:9c:57:9b:5c:b2 ( 2) (0x1f7d) [...]
* 26:a8:54:c9:1d:a1 ( 2) via 26:9c:57:9b:5c:b2 ( 2) (0x1f7d) [...]
root@nm-alix:~# batctl tg | grep d6:89:49:08:f6:9d
* 16:d0:f3:0e:72:a5 ( 3) via d6:89:49:08:f6:9d ( 3) (0x146d) [...]
* fe:54:00:0c:bb:eb ( 3) via d6:89:49:08:f6:9d ( 3) (0x146d) [...]
* 52:54:00:0c:bb:eb ( 3) via d6:89:49:08:f6:9d ( 3) (0x146d) [...]
root@nm-alix:~# batctl o | grep 26:9c:57:9b:5c:b2
26:9c:57:9b:5c:b2 0.784s (225) ce:69:95:f0:a9:53 [ffhl-mesh-vpn]: ce:69:95:f0:a9:53 (225)
root@nm-alix:~# batctl o | grep d6:89:49:08:f6:9d
d6:89:49:08:f6:9d 0.744s (255) ce:69:95:f0:a9:53 [ffhl-mesh-vpn]: ce:69:95:f0:a9:53 (255)
tux@holstentor ~ % ip -oneline link | grep de:ad:ca:fe:46:1d
tux@holstentor ~ % ip -oneline link | grep 26:9c:57:9b:5c:b2
tux@holstentor ~ % ip -oneline link | grep d6:89:49:08:f6:9d
17: ffhl-mesh-vpn: mtu 1426 qdisc fq_codel master mesh-hl state UNKNOWN mode DEFAULT group default qlen 1000\ link/ether d6:89:49:08:f6:9d brd ff:ff:ff:ff:ff:ff
tux@holstentor ~ % sudo batctl -m mesh-hl tl
[B.A.T.M.A.N. adv 2013.4.0, MainIF/MAC: ffhl-mesh-vpn/d6:89:49:08:f6:9d (mesh-hl/16:d0:f3:0e:72:a5 BATMAN_IV), TTVN: 3]
Client VID Flags Last seen (CRC )
16:d0:f3:0e:72:a5 -1 [.P....] 0.000 (0x0000146d)
fe:54:00:0c:bb:eb -1 [......] 0.610 (0x0000146d)
52:54:00:0c:bb:eb -1 [......] 0.000 (0x0000146d)
tux@holstentor ~ % sudo batctl -m mesh-hl tg | grep de:ad:ca:fe:46:1d
* de:ad:ca:fe:46:1d -1 [....] ( 2) 26:9c:57:9b:5c:b2 ( 2) (0x00001f7d)
tux@holstentor ~ % sudo batctl -m mesh-hl tg | grep 26:9c:57:9b:5c:b2
* de:ad:ca:fe:46:1d -1 [....] ( 2) 26:9c:57:9b:5c:b2 ( 2) (0x00001f7d)
* 26:a8:54:c9:1d:a1 -1 [....] ( 2) 26:9c:57:9b:5c:b2 ( 2) (0x00001f7d)
tux@holstentor ~ % sudo batctl -m mesh-hl tg | grep d6:89:49:08:f6:9d
[B.A.T.M.A.N. adv 2013.4.0, MainIF/MAC: ffhl-mesh-vpn/d6:89:49:08:f6:9d (mesh-hl/16:d0:f3:0e:72:a5 BATMAN_IV)]
root@muehlentor ~ # ip -oneline link | grep de:ad:ca:fe:46:1d
root@muehlentor ~ # ip -oneline link | grep 26:9c:57:9b:5c:b2
9: ffhl-mesh-vpn: mtu 1426 qdisc fq_codel master mesh-hl state UNKNOWN mode DEFAULT group default qlen 1000\ link/ether 26:9c:57:9b:5c:b2 brd ff:ff:ff:ff:ff:ff
root@muehlentor ~ # ip -oneline link | grep d6:89:49:08:f6:9d
4: freifunk-hl: mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000\ link/ether de:ad:ca:fe:46:1d brd ff:ff:ff:ff:ff:ff
root@muehlentor ~ # batctl -m mesh-hl tl
[B.A.T.M.A.N. adv 2013.4.0, MainIF/MAC: ffhl-mesh-vpn/26:9c:57:9b:5c:b2 (mesh-hl/26:a8:54:c9:1d:a1 BATMAN_IV), TTVN: 2]
Client VID Flags Last seen (CRC )
de:ad:ca:fe:46:1d -1 [......] 0.010 (0x00001f7d)
26:a8:54:c9:1d:a1 -1 [.P....] 0.000 (0x00001f7d)
root@muehlentor ~ # batctl -m mesh-hl tg | grep de:ad:ca:fe:46:1d
root@muehlentor ~ # batctl -m mesh-hl tg | grep 26:9c:57:9b:5c:b2
[B.A.T.M.A.N. adv 2013.4.0, MainIF/MAC: ffhl-mesh-vpn/26:9c:57:9b:5c:b2 (mesh-hl/26:a8:54:c9:1d:a1 BATMAN_IV)]
root@muehlentor ~ # batctl -m mesh-hl tg | grep d6:89:49:08:f6:9d
* 16:d0:f3:0e:72:a5 -1 [....] ( 3) d6:89:49:08:f6:9d ( 3) (0x0000146d)
* fe:54:00:0c:bb:eb -1 [....] ( 3) d6:89:49:08:f6:9d ( 3) (0x0000146d)
* 52:54:00:0c:bb:eb -1 [....] ( 3) d6:89:49:08:f6:9d ( 3) (0x0000146d)
Output after reboot of nml-wdr4300, with no more issues then:
root@nml-wdr4300:~# batctl tg | grep de:ad:ca:fe:46:1d
* de:ad:ca:fe:46:1d ( 2) via 26:9c:57:9b:5c:b2 ( 2) (0x1f7d) [...]
root@nml-wdr4300:~# batctl tg | grep 26:9c:57:9b:5c:b2
* de:ad:ca:fe:46:1d ( 2) via 26:9c:57:9b:5c:b2 ( 2) (0x1f7d) [...]
* 26:a8:54:c9:1d:a1 ( 2) via 26:9c:57:9b:5c:b2 ( 2) (0x1f7d) [...]
root@nml-wdr4300:~# batctl tg | grep d6:89:49:08:f6:9d
* 16:d0:f3:0e:72:a5 ( 3) via d6:89:49:08:f6:9d ( 5) (0x146d) [...]
* fe:54:00:0c:bb:eb ( 3) via d6:89:49:08:f6:9d ( 5) (0x146d) [...]
* 52:54:00:0c:bb:eb ( 3) via d6:89:49:08:f6:9d ( 5) (0x146d) [...]
Observation from console output:
- TT global on nml-wdr4300 does not match TT local on holstentor.
- New OGMs did not resolve the issue.
- It is unclear how the de:ad:ca:fe:46:1d via holstentor entry could end up in the global TT of nml-wdr4300 as holstentor does not have this address anywhere.
- On nml-wdr4300, batctl does not display the CRC of the correct entry for de:ad:ca:fe:46:1d via muehlentor?
General Notes:
- If this were due to some CRC16 collision the issue might be a lot less likely on a recent batman-adv (it uses CRC32)
- There might have been fixes for this in non-legacy batman-adv already. (I remember some restructuring/fixing around list operations, for instance)
- I seem to stumble over this about once a year. So it is not happening that frequently and might therefore be difficult to reproduce. Migrating to a recent batman-adv is probably less effort than trying to hunt this bug in batman-adv-legacy and would probably fix the issue.
