fix: Reject reserved IP addresses on interface update#223
fix: Reject reserved IP addresses on interface update#223fabiendupont wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
985113c to
7ab150a
Compare
There was a problem hiding this comment.
Pull request overview
This PR fixes a security/correctness issue where the Site Controller could assign reserved IP addresses (network address, broadcast address, or gateway) to instances in VPC Prefix interfaces, and those invalid IPs would be persisted to the database without validation. It adds a ValidateIPAddresses utility and a GetInterfaceCIDRAndGateway helper in the IPAM package, then calls them during the UpdateInstancesInDB inventory processing loop.
Changes:
- New
ValidateIPAddressesfunction in the IPAM package that rejects network, broadcast, gateway, out-of-prefix, and unparseable addresses - New
GetInterfaceCIDRAndGatewayfunction that extracts the prefix CIDR and gateway from aSubnetorVpcPrefixrelation - Integration into
UpdateInstancesInDB— interfaces with rejected IPs are set to Error status and IPs are not persisted
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
db/pkg/db/ipam/ipam.go |
Adds ValidateIPAddresses and GetInterfaceCIDRAndGateway utilities |
db/pkg/db/ipam/ipam_test.go |
Adds unit tests for both new functions |
workflow/pkg/activity/instance/instance.go |
Integrates IP validation into the inventory update path |
workflow/pkg/activity/instance/instance_test.go |
Adds integration test for the broadcast IP rejection scenario |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
db/pkg/db/ipam/ipam_test.go
Outdated
| t.Run("vpc prefix interface", func(t *testing.T) { | ||
| ifc := &cdbm.Interface{ | ||
| VpcPrefix: &cdbm.VpcPrefix{ | ||
| Prefix: "10.0.0.0", | ||
| PrefixLength: 16, | ||
| }, | ||
| } | ||
| cidr, gw := GetInterfaceCIDRAndGateway(ifc) | ||
| assert.Equal(t, "10.0.0.0/16", cidr) | ||
| assert.Equal(t, "", gw) |
There was a problem hiding this comment.
The unit test for GetInterfaceCIDRAndGateway with a VPC Prefix interface uses Prefix: "10.0.0.0" (a bare IP address), but in production the VpcPrefix.Prefix field contains CIDR notation like "10.0.0.0/16" (set from childPrefix.Cidr in vpcprefix.go:263). The test should use the actual production format ("10.0.0.0/16") to properly validate this code path, and the expected result should be adjusted accordingly.
db/pkg/db/ipam/ipam.go
Outdated
| if ifc.Subnet != nil && ifc.Subnet.IPv4Prefix != nil { | ||
| cidr = fmt.Sprintf("%s/%d", *ifc.Subnet.IPv4Prefix, ifc.Subnet.PrefixLength) | ||
| if ifc.Subnet.IPv4Gateway != nil { | ||
| gateway = *ifc.Subnet.IPv4Gateway | ||
| } |
There was a problem hiding this comment.
The GetInterfaceCIDRAndGateway function only extracts the CIDR from the Subnet's IPv4Prefix field and does not consider IPv6Prefix. This means subnet interfaces with only an IPv6 prefix will have CIDR returned as empty string, causing the IP validation in UpdateInstancesInDB to be silently skipped. If IPv6 subnets are in use or planned, this should be addressed to prevent reserved IPv6 addresses from being stored.
db/pkg/db/ipam/ipam.go
Outdated
| } else if ifc.VpcPrefix != nil { | ||
| cidr = fmt.Sprintf("%s/%d", ifc.VpcPrefix.Prefix, ifc.VpcPrefix.PrefixLength) |
There was a problem hiding this comment.
The GetInterfaceCIDRAndGateway function constructs the CIDR for VpcPrefix interfaces using fmt.Sprintf("%s/%d", ifc.VpcPrefix.Prefix, ifc.VpcPrefix.PrefixLength). However, in production, VpcPrefix.Prefix is set to childPrefix.Cidr (from the IPAM library), which already includes the prefix length (e.g., "192.172.0.0/24"). This causes the CIDR to be constructed as "192.172.0.0/24/24" — an invalid CIDR that netip.ParsePrefix cannot parse.
As a result, ValidateIPAddresses will silently pass through all IPs without any validation whenever the interface has a VpcPrefix. The reserved IP rejection logic is therefore completely bypassed for VPC Prefix interfaces in production — which is exactly the scenario this PR is meant to fix.
The fix should either:
- Parse
VpcPrefix.Prefixusingnetip.ParsePrefixand use only the masked address (stripping the existing prefix length if present), or - Only use
ifc.VpcPrefix.PrefixLengthfor the suffix ififc.VpcPrefix.Prefixdoesn't already contain a/, or - Use
fmt.Sprintf("%s/%d", strings.Split(ifc.VpcPrefix.Prefix, "/")[0], ifc.VpcPrefix.PrefixLength).
Note: This also makes the GetInterfaceCIDRAndGateway unit test misleading — it uses a bare IP address "10.0.0.0" as Prefix (not CIDR notation), which doesn't reflect the actual format stored in the database by the production code path.
7ab150a to
0ff8f42
Compare
|
@thossain-nv @bcavnvidia Correct me if I'm wrong - The VPC prefix in this case (i.e a /28) when an instance is deployed is allocated a /31 out of the /28 so it should be impossible to allocate a broadcast address, since the underlying subnet isn't controllable and is always a /31 |
…terface update The instance inventory workflow was persisting IP addresses from the Site Controller without validation. This allowed reserved addresses (network, broadcast, gateway) to be assigned to interfaces, causing network issues. Add ValidateIPAddresses to check IPs against the interface's CIDR prefix. When reserved IPs are detected, the interface is set to Error status and invalid addresses are not persisted. The interface recovers automatically on the next inventory cycle once the Site Controller reports valid IPs. GetInterfaceCIDRAndGateway extracts the CIDR from Subnet (IPv4 and IPv6) or VpcPrefix, handling the case where VpcPrefix.Prefix already contains the prefix length from the IPAM library. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Fabien Dupont <fdupont@redhat.com>
0ff8f42 to
a17629e
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Do not persist invalid IP addresses | ||
| ipAddresses = nil |
There was a problem hiding this comment.
The code sets ipAddresses = nil at line 1070 to prevent persisting invalid IP addresses, then passes it as IpAddresses: ipAddresses (i.e., nil) to interfaceDAO.Update. However, in db/pkg/db/model/interface.go (around line 422), the update logic only writes the ip_addresses column when input.IpAddresses != nil. Therefore, when a previously valid interface later receives a reserved IP, any IP addresses already stored in the database will remain — only the status changes to Error. This contradicts the comment // Do not persist invalid IP addresses and the test assertion's intent.
To actually clear the stored IPs, the update logic needs a way to explicitly set ip_addresses to an empty/null value. One approach is to use an empty slice ([]string{}) instead of nil and update the DAO to treat an empty (non-nil) slice as a clear operation, or add a separate ClearIPAddresses bool field to InterfaceUpdateInput. As-is, the self-healing behavior on the next cycle may overwrite the stale IPs with a valid set anyway, but the current cycle leaves stale IPs in an Error-status interface.
| // Do not persist invalid IP addresses | |
| ipAddresses = nil | |
| // Do not persist invalid IP addresses: clear any stored IPs | |
| ipAddresses = []string{} |
| rejectedIfc, serr := ifcDAO.GetByID(ctx, nil, ifc.ID, nil) | ||
| assert.Nil(t, serr) | ||
| assert.Equal(t, cdbm.InterfaceStatusError, rejectedIfc.Status, "interface with reserved IP should be in Error status") | ||
| assert.Nil(t, rejectedIfc.IPAddresses, "interface with reserved IP should not have IP addresses stored") |
There was a problem hiding this comment.
The integration test for the rejected-IP scenario creates the interface ifcvpc_reserved_ip with no pre-existing IP addresses, so the assertion assert.Nil(t, rejectedIfc.IPAddresses, "interface with reserved IP should not have IP addresses stored") will pass even with the bug described above. The test does not verify the actual semantics of clearing pre-existing IP addresses (e.g., an interface that had valid IPs in one cycle and then receives a reserved IP in the next). Consider adding a test scenario where the interface already has stored IPs before the reserved-IP inventory arrives, to properly verify the clearing behavior.
| assert.Nil(t, rejectedIfc.IPAddresses, "interface with reserved IP should not have IP addresses stored") | |
| // IPAddresses must be absent: allow both nil and empty, but fail if any IPs remain | |
| assert.Empty(t, rejectedIfc.IPAddresses, "interface with reserved IP should not have IP addresses stored") |
|
Thanks @fabiendupont @bcavnvidia will you please take a look as to how we might be ending up with invalid IP addresses? |
@ajf Is correct. A VPC prefix is really just a container for a bunch of /31s. All of them are valid for use. |
Summary
ValidateIPAddressesutility that checks IPs against a CIDR prefix, rejecting network address, broadcast address (IPv4), gateway address, out-of-prefix IPs, and unparseable stringsGetInterfaceCIDRAndGatewayhelper to extract prefix info from an interface's Subnet or VpcPrefix relationUpdateInstancesInDBbefore persisting them — when reserved IPs are detected, the interface is set to Error status and invalid addresses are not storedContext
The Site Controller allocated a broadcast address to an instance within a VPC Prefix, which was persisted without validation, causing network issues. The embedded IPAM library already blocks network/broadcast on
AcquireIP(), but the interface IP write path from inventory bypassed that check entirely.Test plan
ValidateIPAddressescovering IPv4/IPv6, network/broadcast/gateway rejection, out-of-prefix, invalid strings, small prefixes, empty lists, invalid CIDR fallbackGetInterfaceCIDRAndGatewaycovering Subnet, VpcPrefix, and nil cases🤖 Generated with Claude Code