Skip to content

Some further details as to why EVPN interop tests are yielding different results #2101

Open
@jbemmel

Description

@jbemmel

I didn't have the willpower to run the tests with vjunos-switch, but the reason the tests with FRR were failing on SR Linux is batshit crazy:

https://github.com/ipspace/blog/blob/master/content/posts/2025/04/evpn-symmetric-irb-arp.md

Originally posted by @ipspace in #2086

I enabled Zebra debugging and found some interesting tidbits:

2025/04/01 01:10:17 ZEBRA: [KKAC1-JMWTB] Rx RTM_NEWNEIGH family ipv4 IF varp-40000(11) vrf customer(6) IP 172.16.0.2 MAC aa:c1:ab:43:d5:60 state 0x4 flags 0x0 ext_flags 0x0
2025/04/01 01:10:17 ZEBRA: [SGBRA-T9E0Z] zebra neigh add if varp-40000/11 172.16.0.2 aa:c1:ab:43:d5:60
2025/04/01 01:10:17 ZEBRA: [J11JK-NWSBV] zebra neigh new if 11 172.16.0.2 aa:c1:ab:43:d5:60
2025/04/01 01:10:17 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWNEIGH(28), len=76, seq=0, pid=0

Problem seems to be HERE:

2025/04/01 01:10:17 ZEBRA: [TDS34-MNEJW]     Neighbor Entry received is not on a VLAN or a BRIDGE, ignoring

2025/04/01 01:10:17 ZEBRA: [KKAC1-JMWTB] Rx RTM_NEWNEIGH family ipv4 IF varp-40000(11) vrf customer(6) IP 172.16.0.2 MAC aa:c1:ab:43:d5:60 state 0x2 flags 0x0 ext_flags 0x0
2025/04/01 01:10:17 ZEBRA: [SGBRA-T9E0Z] zebra neigh add if varp-40000/11 172.16.0.2 aa:c1:ab:43:d5:60

Building FRR is a bit of a pain, but editing rt_netlink.c

} else if (IS_ZEBRA_IF_BRIDGE(ifp) || IS_ZEBRA_IF_MACVLAN(ifp)) {
		link_if = ifp;
		if (IS_ZEBRA_DEBUG_KERNEL)
			zlog_debug(
				"    Neighbor Entry received, IS_ZEBRA_IF_MACVLAN=%d",
				IS_ZEBRA_IF_MACVLAN(ifp));
}

does seem to work

However, when I replace FRR with Cumulus NVUE the scenario does work without pinging the gateway. And it hits the same Zebra issue of not processing ARPs from macvlan interfaces. However, before that Cumulus receives a BGP EVPN update with the MAC+IP of H2 (!):

2025/04/01 04:05:44 ZEBRA: [XAYAY-GEJ4Q] Recv MACIP ADD VNI 21000 MAC aa:c1:ab:24:1b:48 flags 0x0 seq 0 VTEP 10.0.0.6 ESI - from bgp
2025/04/01 04:05:44 ZEBRA: [XAYAY-GEJ4Q] Recv MACIP ADD VNI 21000 MAC aa:c1:ab:24:1b:48 IP 172.16.0.2 flags 0x0 seq 0 VTEP 10.0.0.6 ESI - from bgp
2025/04/01 04:05:44 ZEBRA: [JWQ3J-TKSAT] zebra_evpn_mac_add: MAC aa:c1:ab:24:1b:48 flags None
2025/04/01 04:05:44 ZEBRA: [JKGET-WF857] bridge br_default VID 1000 MAC aa:c1:ab:24:1b:48 find - 0x0
2025/04/01 04:05:44 ZEBRA: [HK55T-96BTH] Failed to find mac aa:c1:ab:24:1b:48 in local cache
2025/04/01 04:05:44 ZEBRA: [S7Q3Q-N2C38] Processing neighbors on remote MAC aa:c1:ab:24:1b:48 ADD, VNI 21000

...

2025/04/01 04:05:44 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_NEWNEIGH(28), len=76, seq=0, pid=0
2025/04/01 04:05:44 ZEBRA: [TE15A-AYCGJ]  Neighbor Entry IF vlan1000-v0(12) and  IP 172.16.0.2 is not on a VLAN or BRIDGE, ignoring
2025/04/01 04:05:44 ZEBRA: [KKAC1-JMWTB] Rx RTM_NEWNEIGH family ipv4 IF vlan1000-v0(12) vrf customer(10) IP 172.16.0.2 MAC aa:c1:ab:24:1b:48 state 0x4 flags 0x0 ext_flags 0x0
2025/04/01 04:05:44 ZEBRA: [SGBRA-T9E0Z] zebra neigh add if vlan1000-v0/12 172.16.0.2 aa:c1:ab:24:1b:48
2025/04/01 04:05:44 ZEBRA: [J11JK-NWSBV] zebra neigh new if 12 172.16.0.2 aa:c1:ab:24:1b:48

The ARP table on SR Linux side:

A:s1# show arpnd arp-entries
+-------------------+-------------------+-----------------+-------------------+-------------------------------------+------------------------------------------------------------------------+
|     Interface     |   Subinterface    |    Neighbor     |      Origin       |         Link layer address          |                                 Expiry                                 |
+===================+===================+=================+===================+=====================================+========================================================================+
| ethernet-1/1      |                 0 |        10.1.0.2 |           dynamic | 52:54:00:A7:A4:A8                   | 3 hours from now                                                       |
| irb0              |              1000 |      172.16.0.1 |              evpn | 08:4F:C2:A9:01:04                   |                                                                        |
| irb0              |              1000 |      172.16.0.2 |           dynamic | AA:C1:AB:24:1B:48                   | 3 hours from now                                                       |
| irb0              |              1000 |      172.16.0.3 |              evpn | AA:C1:AB:F6:7E:F7                   |                                                                        |
| irb0              |              1001 |      172.16.1.4 |           dynamic | AA:C1:AB:21:45:3D                   | 3 hours from now                                                       |
| mgmt0             |                 0 |   192.168.121.1 |           dynamic | 52:54:00:D8:3F:0D                   | 3 hours from now                                                       |
+-------------------+-------------------+-----------------+-------------------+-------------------------------------+------------------------------------------------------------------------+
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Total entries : 6 (0 static, 6 dynamic)

which confirms that H2 (172.16.0.2) was learned dynamically on S1, not received via EVPN from S2.

From a packet trace on the s1-s2 link, it looks like S1 learns about H2's IP/MAC address from its ARP request to H3:

09:26:18.251011 1a:69:04:ff:00:01 > 52:54:00:c6:3e:f2, ethertype IPv4 (0x0800), length 92: (tos 0x0, ttl 255, id 1, offset 0, flags [DF], proto UDP (17), length 78)
    10.0.0.6.58647 > 10.0.0.1.4789: VXLAN, flags [I] (0x08), vni 21000
aa:c1:ab:c7:84:43 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 172.16.0.3 tell 172.16.0.2, length 28

This ARP packet does not get sent in case of FRR acting as S2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions