Skip to content

nodes: strip launcher-managed mgmt IP from saved vrnetlab configs#3263

Open
alxrxs wants to merge 2 commits into
srl-labs:mainfrom
alxrxs:fix/vrnetlab-mgmt-ip-persistence
Open

nodes: strip launcher-managed mgmt IP from saved vrnetlab configs#3263
alxrxs wants to merge 2 commits into
srl-labs:mainfrom
alxrxs:fix/vrnetlab-mgmt-ip-persistence

Conversation

@alxrxs

@alxrxs alxrxs commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Summary

VRNode.SaveConfig (used by every vrnetlab-backed kind that doesn't override it) saves the full running-config verbatim via netconf.GetConfig, including whatever the device currently has configured on its management interface. The vrnetlab launcher itself also configures that same interface, every boot, from the container's actual clab-assigned IP -- that's the real source of truth for management addressing, not anything in the saved/startup config.

On a redeploy, the saved config is reused as-is by default (!EnforceStartupConfig && FileExists(dst) in GenerateConfig). If the mgmt IP saved in it is stale -- most commonly because a subsequent deploy assigned the container a different mgmt IP (a partial --node-filter deploy, Docker IPAM handing out a different address, etc.) -- the node ends up with two competing mgmt-interface configurations applied during the same boot: the launcher's current one and the stale saved one. Depending on the platform's config-apply order, the stale one can win. The node comes up completely healthy (container healthcheck passes, the NOS itself is fine) but unreachable at the IP clab/DNS actually expect.

This is exactly the failure I hit and root-caused on a containerlab lab running the XRd vRouter kind: containerlab save had captured a running-config with MgmtEth0/RP0/CPU0/0 pinned to one IP; a later redeploy assigned the container a different IP; XRd came up healthy but nothing answered ARP for the address clab reported.

What this PR does

Adds FilterMgmtIPConfigLines (nodes/vr_node.go) and calls it from VRNode.SaveConfig before the config is written to disk, so newly saved configs never carry a mgmt address that can compete with the launcher's.

It's deliberately not a blind "delete any line mentioning the mgmt IP":

  • Only lines that both contain the mgmt IP as a token (so 172.20.20.2 doesn't also match 172.20.20.20) and look like an address assignment (guarded on the literal word "address", case-insensitive) are touched -- unrelated references like snmp-server host <mgmt-ip> or ntp server <mgmt-ip> are left alone. Lines carrying description are excluded too, so an interface description that happens to contain both the mgmt IP and the word "address" is not stripped (a real address line never contains the word "description").
  • It's structure-aware, not line-blind: a leaf statement (IOS(-XR) ip(v4) address x, Junos address x/y;, RouterOS add address=x/y ...) only has that one line removed, leaving the enclosing interface/context header intact. A line that opens a hierarchy block (Junos address x/y { primary; }) has the whole balanced block removed instead, so braces stay consistent -- a naive per-line delete would otherwise leave Junos config with unbalanced braces.

It's gated by a small platform allowlist (mgmtConfigFilterPlatforms) rather than applied universally. Stripping the saved mgmt address is only safe for kinds that re-establish their mgmt address on deploy independently of the saved config -- i.e. a vrnetlab launcher that configures the mgmt interface itself from the container's IP every boot. That's exactly the set fixed launcher-side in the companion srl-labs/vrnetlab PR: cisco_asa, cisco_iosxe, cisco_iosxr, cisco_nxos, juniper_junos.

Deliberately excluded even though their address syntax would match: kinds whose launcher applies the startup-config verbatim with no mgmt stanza of its own -- notably cisco_ios (vios) and the dell_emc/ftosv OS10 launcher -- because for those, stripping the saved mgmt address would leave a reused full-config with no mgmt address at all (the regression this change exists to prevent, inverted). Also left out until their launcher's mgmt behavior is verified: arista_eos, huawei_vrp, ipinfusion_ocnos, and aruba_aoscx (whose ip static syntax the "address" guard wouldn't match anyway). Any platform not in the allowlist is untouched (pure no-op), so this can't regress a kind nobody has verified.

vr_ros already had its own version of this exact fix (filterManagementInterfaceConfig, scoped to RouterOS's ether1) -- refactored onto the shared implementation in the second commit.

Testing

Unit tests in nodes/vr_node_mgmtfilter_test.go cover: IOS-XR/IOS/NX-OS leaf removal, Junos leaf removal, Junos block removal (the case that would break under a naive per-line delete), RouterOS set-style, the 172.20.20.2 vs 172.20.20.20/172.20.20.200 boundary, the address-keyword guard (mgmt IP appears in a non-address line and is correctly left alone), IPv6, dual-stack, and the no-mgmt-IP-set no-op case.

I couldn't run the full nodes package test suite in my environment (pre-existing, unrelated build failures on this checkout: a Linux-only transitive dependency and an unrelated gopsutil API mismatch) but the new logic is fully covered in isolation and gofmt-clean.

@alxrxs alxrxs marked this pull request as draft July 2, 2026 17:31
@alxrxs alxrxs force-pushed the fix/vrnetlab-mgmt-ip-persistence branch 2 times, most recently from d6b03ee to 9699f59 Compare July 2, 2026 20:05
alxrxs added 2 commits July 2, 2026 23:22
vrnetlab-backed (VRNode) kinds configure their management interface from
the container's actual clab-assigned IP on every boot. SaveConfig
persists the full running-config verbatim, including that address; on
the next deploy the saved config is reused as-is (default, non-
--reconfigure behavior), pinning a now-stale mgmt IP. If the container
is later assigned a different mgmt address (partial --node-filter
deploy, IPAM handing out a different IP, etc.), the node applies both
the launcher's current address and the stale saved one -- whichever the
device's config-apply order favors wins. When it's the stale one, the
node comes up fully functional but unreachable at the address clab/DNS
expect: "healthy" per the container healthcheck, but not reachable.

Add FilterMgmtIPConfigLines, wired into VRNode.SaveConfig, gated by a
verified per-platform allowlist so unverified kinds are left untouched.
It's structure-aware rather than a blind line-delete: a leaf statement
(IOS(-XR) "ip(v4) address x", Junos "address x/y;") only drops that
line; a line that opens a hierarchy block (Junos "address x/y { ... }")
has the whole balanced block removed so braces stay consistent. Matches
are also gated on the literal word "address" so unrelated references to
the mgmt IP (snmp-server host, ntp server, a BGP neighbor) are never
touched, and lines carrying "description" are excluded too, so an
interface description that happens to contain both the mgmt IP and the
word "address" is left alone.

vr_ros already carried a bespoke version of this fix
(filterManagementInterfaceConfig, RouterOS-only) -- refactored onto the
shared implementation.
Same fix as the previous commit, generalized: vr_ros already stripped
its management interface's IP out of saved configs
(filterManagementInterfaceConfig), just for RouterOS specifically. Drop
the bespoke version in favor of the shared FilterMgmtIPConfigLines.
@alxrxs alxrxs force-pushed the fix/vrnetlab-mgmt-ip-persistence branch from 9699f59 to 3f8f3e8 Compare July 2, 2026 20:22
@alxrxs alxrxs marked this pull request as ready for review July 2, 2026 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant