SR-OS ECMP semantics: device ecmp 1 (single best path) vs Batfish (all equal-cost IGP paths)
SR OS installs a single best path per prefix by default (ecmp 1), so its
route-table reports one next-hop even when several equal-cost IGP paths exist.
Batfish has no per-IGP ECMP limit and always installs every equal-cost path
(PsThenLoadBalance: "Batfish models all ECMP paths"; OSPF/IS-IS have no VI ECMP
knob). For an equidistant prefix Batfish therefore holds the device's chosen
next-hop plus extra equal-cost legs, and test_main_rib_routes flags the
surplus legs as Batfish-only routes.
Seen in the sros_services lab: 10.10.10.20/32 (and symmetric prefixes) on
p1/p2/pe2/pe4 — the device installs one OSPF next-hop, Batfish installs
2–4 equal-cost legs. These nodes' main-RIB tests are sickbay'd to this issue.
Why not a validator-side workaround
An earlier attempt forgave the surplus legs in SrosValidator whenever the
device installed a single next-hop. That globally weakens the cost matcher for
every SR-OS lab and masks a real failure mode — "device has 1 path, Batfish
computes the right one plus wrong extras" (e.g. a metric miscomputation that
creates a spurious tie) would pass. Reverted; the mismatch is sickbay'd per-lab
instead so the matcher stays strict everywhere.
Options to actually close this
- Model SR-OS
ecmp in Batfish. Batfish ECMP is effectively binary (1 vs
infinite). If a VI knob limits IGP to a single best path (deterministic
tiebreak), SR-OS ecmp 1 could convert to it and device/Batfish would agree.
Needs Batfish-side support (filed companion: batfish/batfish — IGP ECMP limit).
- Make ECMP labs deterministic. Per the lab-design guidance
(infra/README.md "Determinism"), give the underlay a single genuine best
path (asymmetric IGP metrics) so device and Batfish agree at one path with no
tolerance needed.
- Validate the device next-hop is a subset of Batfish's legs as an explicit,
opt-in comparison mode (not the default strict matcher), if we decide ECMP
over-approximation is acceptable for some labs.
SR-OS ECMP semantics: device
ecmp 1(single best path) vs Batfish (all equal-cost IGP paths)SR OS installs a single best path per prefix by default (
ecmp 1), so itsroute-table reports one next-hop even when several equal-cost IGP paths exist.
Batfish has no per-IGP ECMP limit and always installs every equal-cost path
(
PsThenLoadBalance: "Batfish models all ECMP paths"; OSPF/IS-IS have no VI ECMPknob). For an equidistant prefix Batfish therefore holds the device's chosen
next-hop plus extra equal-cost legs, and
test_main_rib_routesflags thesurplus legs as Batfish-only routes.
Seen in the
sros_serviceslab:10.10.10.20/32(and symmetric prefixes) onp1/p2/pe2/pe4— the device installs one OSPF next-hop, Batfish installs2–4 equal-cost legs. These nodes' main-RIB tests are sickbay'd to this issue.
Why not a validator-side workaround
An earlier attempt forgave the surplus legs in
SrosValidatorwhenever thedevice installed a single next-hop. That globally weakens the cost matcher for
every SR-OS lab and masks a real failure mode — "device has 1 path, Batfish
computes the right one plus wrong extras" (e.g. a metric miscomputation that
creates a spurious tie) would pass. Reverted; the mismatch is sickbay'd per-lab
instead so the matcher stays strict everywhere.
Options to actually close this
ecmpin Batfish. Batfish ECMP is effectively binary (1 vsinfinite). If a VI knob limits IGP to a single best path (deterministic
tiebreak), SR-OS
ecmp 1could convert to it and device/Batfish would agree.Needs Batfish-side support (filed companion: batfish/batfish — IGP ECMP limit).
(
infra/README.md"Determinism"), give the underlay a single genuine bestpath (asymmetric IGP metrics) so device and Batfish agree at one path with no
tolerance needed.
opt-in comparison mode (not the default strict matcher), if we decide ECMP
over-approximation is acceptable for some labs.