system-tests: support IPv6 single stack#1335
system-tests: support IPv6 single stack#1335laurayang842 wants to merge 5 commits intorh-ecosystem-edge:mainfrom
Conversation
📝 WalkthroughWalkthroughTests were changed to detect and exercise configured IP families (IPv4/IPv6) dynamically across egress, service, VLAN/MACVLAN, SR‑IOV/pod‑bond, and whereabouts statefulset/service code. The monitoring server was switched to an IPv6 dual‑stack bind. Soft reboot/NMI flows now use BootID change to detect reboot progress. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (5)
tests/system-tests/rdscore/internal/rdscorecommon/whereabouts-statefulset.go (1)
243-244: IPv6 regex may miss addresses starting with::.The IPv6 regex pattern requires the address to start with hex digits, so it won't match compressed IPv6 addresses that begin with
::(e.g.,::1/128,::ffff:192.168.1.1/96). For typical NAD configurations using ULA or link-local addresses, this should be sufficient, but consider extending the pattern if::prefixed addresses are expected.♻️ Suggested extended pattern
- hasIPv6 := regexp.MustCompile(`[0-9a-fA-F]{1,4}(:[0-9a-fA-F]{0,4}){2,}/\d+`).MatchString(config) + hasIPv6 := regexp.MustCompile(`([0-9a-fA-F]{0,4}:){2,}[0-9a-fA-F]{0,4}/\d+`).MatchString(config)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/system-tests/rdscore/internal/rdscorecommon/whereabouts-statefulset.go` around lines 243 - 244, The IPv6 regex assigned to hasIPv6 misses compressed addresses starting with "::"; update the validation so hasIPv6 recognizes leading-compressed forms (e.g., ::1/128, ::ffff:192.168.1.1/96) by replacing the current regex with one that allows an initial "::" compression variant or, better, switch to parsing using the standard library (e.g., use net.ParseCIDR/net.ParseIP on the substring) to robustly detect IPv6 CIDRs; change the logic around the hasIPv6 variable accordingly so both compressed and full IPv6 addresses are accepted.tests/system-tests/rdscore/internal/rdscorecommon/sriov-pod-level-bond.go (1)
1363-1390: Failover traffic now covers only one family in dual-stack runs.When both addresses are configured, this always exercises the long-lived failover flow over IPv4. That means the critical “traffic survives VF failover” path is no longer validated for IPv6 in dual-stack environments. Consider running this step once per configured family, or parameterizing the preferred family per spec, so the new IPv6 support is covered during failover too.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/system-tests/rdscore/internal/rdscorecommon/sriov-pod-level-bond.go` around lines 1363 - 1390, The test currently picks a single target IP via the switch (RDSCoreConfig.PodLevelBondDeploymentTwoIPv4 / PodLevelBondDeploymentTwoIPv6) so dual-stack runs only exercise IPv4 failover; change the logic in the test around target selection and the call to generateTCPTraffic so it runs once per configured family instead of choosing one: iterate over the configured addresses (check RDSCoreConfig.PodLevelBondDeploymentTwoIPv4 and PodLevelBondDeploymentTwoIPv6), for each non-empty address set targetIP, log and call generateTCPTraffic(clientPodObj, targetIP, RDSCoreConfig.PodLevelBondPort, "10", "5") and assert Expect(err).ToNot(HaveOccurred(...)) per-family; alternatively add a parameter to control preferred family and run the step for both when both are present (use the same generateTCPTraffic and logging patterns).tests/system-tests/rdscore/internal/rdscorecommon/egress-service.go (1)
385-410: Default case creates IPv4 service when no addresses are configured.When neither
remoteTargetIPnorremoteTargetIPv6is set, thedefaultcase still creates an IPv4 SingleStack service. While downstream connectivity tests will fail with a clear assertion, consider failing fast here to provide clearer error messaging.♻️ Optional: Add explicit validation before IP family switch
hasIPv4 := remoteTargetIP != "" hasIPv6 := remoteTargetIPv6 != "" + Expect(hasIPv4 || hasIPv6).To(BeTrue(), + "At least one remote target IP (IPv4 or IPv6) must be configured") + switch { case hasIPv4 && hasIPv6:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/system-tests/rdscore/internal/rdscorecommon/egress-service.go` around lines 385 - 410, The switch's default branch creates an IPv4 SingleStack service even when neither remoteTargetIP nor remoteTargetIPv6 is provided; update the logic to validate inputs before the switch (check remoteTargetIP and remoteTargetIPv6) and fail fast with a clear error/expectation when both are empty instead of calling svcBuilder.WithIPFamily; locate the decision using the variables remoteTargetIP and remoteTargetIPv6 and the call sites to svcBuilder.WithIPFamily (and the switch that sets corev1.IPFamilyPolicyRequireDualStack / corev1.IPFamilyPolicySingleStack) and replace the default path with an early return or explicit test failure that logs/returns a descriptive message about missing target addresses.tests/system-tests/rdscore/internal/rdscorecommon/monitoring-config-validation.go (1)
493-499: Consider explicitly enabling dual-stack socket behavior.The
DualStackTCPServerclass setsaddress_family = socket.AF_INET6and binds to"::", which relies on OS defaults for accepting IPv4 connections via IPv4-mapped IPv6 addresses. On systems withnet.ipv6.bindv6only=1or certain container configurations, this may reject IPv4 connections.For reliable dual-stack support, explicitly set the
IPV6_V6ONLYsocket option toFalse:♻️ Proposed fix for explicit dual-stack support
import socket class DualStackTCPServer(socketserver.TCPServer): address_family = socket.AF_INET6 + + def server_bind(self): + self.socket.setsockopt(socket.IPPROTO_IPV6, socket.IPV6_V6ONLY, 0) + super().server_bind()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/system-tests/rdscore/internal/rdscorecommon/monitoring-config-validation.go` around lines 493 - 499, The DualStackTCPServer currently sets address_family = socket.AF_INET6 and binds to ("::") but doesn’t explicitly disable IPV6_V6ONLY; update DualStackTCPServer (the class definition) to override server_bind (or before binding) and call self.socket.setsockopt(socket.IPPROTO_IPV6, socket.IPV6_V6ONLY, 0) prior to calling the base server_bind so the socket accepts IPv4-mapped connections; ensure this logic runs before binding the ("::", PORT) socket used with RemoteWriteHandler.tests/system-tests/rdscore/internal/rdscorecommon/hard-reboot.go (1)
304-329: Consider adding a brief stabilization delay or explicit Ready verification when boot ID changes.The conditional logic is sound—if we detected boot ID change without observing NotReady, the node should already be Ready. However, there's an implicit assumption that the node is fully operational at that moment.
This deviates from the SPK implementation (
tests/system-tests/spk/internal/spkcommon/reboot.go:277-300) which unconditionally waits for the Ready state. While the deviation is intentional per the PR description (handling Dell R660 fast reboots), consider:
- Adding a brief
time.Sleep(e.g., 10-15 seconds) after the boot ID change detection to ensure node stability before uncordoning.- Or, explicitly verifying the node is Ready when boot ID changed (even without the full Eventually timeout).
This would provide an additional safety margin against edge cases where the node reports Ready momentarily but hasn't fully stabilized.
♻️ Optional: Add explicit Ready verification for boot-ID-only detection
if nodeWentNotReady { By(fmt.Sprintf("Checking node %q got into Ready", _node.Definition.Name)) Eventually(func(ctx SpecContext) bool { // ... existing Ready wait logic }).WithTimeout(25*time.Minute).WithPolling(15*time.Second).WithContext(ctx).Should(BeTrue(), "Node hasn't reached Ready state") + } else { + // Boot ID changed without NotReady observation - node rebooted fast + // Brief delay to ensure stability before uncordoning + klog.V(rdscoreparams.RDSCoreLogLevel).Infof( + "Node %q rebooted fast (boot ID changed), allowing brief stabilization", _node.Definition.Name) + time.Sleep(15 * time.Second) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/system-tests/rdscore/internal/rdscorecommon/hard-reboot.go` around lines 304 - 329, The block guarded by nodeWentNotReady currently assumes the node is stable when a boot-ID change was seen; add a short stabilization step before proceeding (either a small time.Sleep of ~10-15s or an explicit quick Ready check) to avoid flapping: locate the branch using nodeWentNotReady and _node.Definition.Name (the Eventually that pulls currentNode via nodes.Pull and checks condition.Type == rdscoreparams.ConditionTypeReadyString) and insert a brief pause or an immediate fast verification loop that asserts condition.Status == rdscoreparams.ConstantTrueString before uncordoning/continuing so the node has a small margin to stabilize after boot-ID change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/system-tests/rdscore/internal/rdscorecommon/ipvlan-validation.go`:
- Around line 306-344: Add validation (like in
VerifyIPVLANConnectivityBetweenDifferentNodes) to ensure at least one address is
configured before iterating: check RDSCoreConfig.IPVlanDeploy3TargetAddress or
RDSCoreConfig.IPVlanDeploy3TargetAddressIPv6 and if both are empty log an error
(or fail the test) with a clear message and skip/abort the Deploy3 loop; do the
same for RDSCoreConfig.IPVlanDeploy4TargetAddress and
RDSCoreConfig.IPVlanDeploy4TargetAddressIPv6 before the Deploy4 loop so
verifySRIOVConnectivity is only called when a target address exists.
- Around line 161-199: The loop blocks for IPVlan groups lack a check that at
least one target address is configured; add the same validation used in
macvlan-validation.go: for the first group ensure
RDSCoreConfig.IPVlanDeploy1TargetAddress or
RDSCoreConfig.IPVlanDeploy1TargetAddressIPv6 is non-empty and for the second
group ensure RDSCoreConfig.IPVlanDeploy2TargetAddress or
RDSCoreConfig.IPVlanDeploy2TargetAddressIPv6 is non-empty, and if both are empty
fail the test immediately (use the test failure helper in this test suite, e.g.,
t.Fatalf or framework.Failf) before iterating and calling
verifySRIOVConnectivity.
In `@tests/system-tests/rdscore/internal/rdscorecommon/sriov-pod-level-bond.go`:
- Around line 265-277: The current validation only checks that an IP address
requires a subnet mask (bondInfIPv4 -> bondInfSubMaskIPv4 and bondInfIPv6 ->
bondInfSubMaskIPv6) but does not reject the inverse (a subnet mask provided
without a corresponding address); update the validation to be symmetric by also
returning an error when bondInfSubMaskIPv4 != "" && bondInfIPv4 == "" and when
bondInfSubMaskIPv6 != "" && bondInfIPv6 == "" so you fail fast on malformed
configs and surface bad test input; keep the existing log messages
(klog.V(100).Infof) and use analogous error text for the added checks.
---
Nitpick comments:
In `@tests/system-tests/rdscore/internal/rdscorecommon/egress-service.go`:
- Around line 385-410: The switch's default branch creates an IPv4 SingleStack
service even when neither remoteTargetIP nor remoteTargetIPv6 is provided;
update the logic to validate inputs before the switch (check remoteTargetIP and
remoteTargetIPv6) and fail fast with a clear error/expectation when both are
empty instead of calling svcBuilder.WithIPFamily; locate the decision using the
variables remoteTargetIP and remoteTargetIPv6 and the call sites to
svcBuilder.WithIPFamily (and the switch that sets
corev1.IPFamilyPolicyRequireDualStack / corev1.IPFamilyPolicySingleStack) and
replace the default path with an early return or explicit test failure that
logs/returns a descriptive message about missing target addresses.
In `@tests/system-tests/rdscore/internal/rdscorecommon/hard-reboot.go`:
- Around line 304-329: The block guarded by nodeWentNotReady currently assumes
the node is stable when a boot-ID change was seen; add a short stabilization
step before proceeding (either a small time.Sleep of ~10-15s or an explicit
quick Ready check) to avoid flapping: locate the branch using nodeWentNotReady
and _node.Definition.Name (the Eventually that pulls currentNode via nodes.Pull
and checks condition.Type == rdscoreparams.ConditionTypeReadyString) and insert
a brief pause or an immediate fast verification loop that asserts
condition.Status == rdscoreparams.ConstantTrueString before
uncordoning/continuing so the node has a small margin to stabilize after boot-ID
change.
In
`@tests/system-tests/rdscore/internal/rdscorecommon/monitoring-config-validation.go`:
- Around line 493-499: The DualStackTCPServer currently sets address_family =
socket.AF_INET6 and binds to ("::") but doesn’t explicitly disable IPV6_V6ONLY;
update DualStackTCPServer (the class definition) to override server_bind (or
before binding) and call self.socket.setsockopt(socket.IPPROTO_IPV6,
socket.IPV6_V6ONLY, 0) prior to calling the base server_bind so the socket
accepts IPv4-mapped connections; ensure this logic runs before binding the
("::", PORT) socket used with RemoteWriteHandler.
In `@tests/system-tests/rdscore/internal/rdscorecommon/sriov-pod-level-bond.go`:
- Around line 1363-1390: The test currently picks a single target IP via the
switch (RDSCoreConfig.PodLevelBondDeploymentTwoIPv4 /
PodLevelBondDeploymentTwoIPv6) so dual-stack runs only exercise IPv4 failover;
change the logic in the test around target selection and the call to
generateTCPTraffic so it runs once per configured family instead of choosing
one: iterate over the configured addresses (check
RDSCoreConfig.PodLevelBondDeploymentTwoIPv4 and PodLevelBondDeploymentTwoIPv6),
for each non-empty address set targetIP, log and call
generateTCPTraffic(clientPodObj, targetIP, RDSCoreConfig.PodLevelBondPort, "10",
"5") and assert Expect(err).ToNot(HaveOccurred(...)) per-family; alternatively
add a parameter to control preferred family and run the step for both when both
are present (use the same generateTCPTraffic and logging patterns).
In
`@tests/system-tests/rdscore/internal/rdscorecommon/whereabouts-statefulset.go`:
- Around line 243-244: The IPv6 regex assigned to hasIPv6 misses compressed
addresses starting with "::"; update the validation so hasIPv6 recognizes
leading-compressed forms (e.g., ::1/128, ::ffff:192.168.1.1/96) by replacing the
current regex with one that allows an initial "::" compression variant or,
better, switch to parsing using the standard library (e.g., use
net.ParseCIDR/net.ParseIP on the substring) to robustly detect IPv6 CIDRs;
change the logic around the hasIPv6 variable accordingly so both compressed and
full IPv6 addresses are accepted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 52ac1979-e292-4969-af4b-b5767d0a9a5a
📒 Files selected for processing (8)
tests/system-tests/rdscore/internal/rdscorecommon/egress-ip.gotests/system-tests/rdscore/internal/rdscorecommon/egress-service.gotests/system-tests/rdscore/internal/rdscorecommon/hard-reboot.gotests/system-tests/rdscore/internal/rdscorecommon/ipvlan-validation.gotests/system-tests/rdscore/internal/rdscorecommon/macvlan-validation.gotests/system-tests/rdscore/internal/rdscorecommon/monitoring-config-validation.gotests/system-tests/rdscore/internal/rdscorecommon/sriov-pod-level-bond.gotests/system-tests/rdscore/internal/rdscorecommon/whereabouts-statefulset.go
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
tests/system-tests/rdscore/internal/rdscorecommon/sriov-pod-level-bond.go (1)
1373-1412: MovetargetIPselection (and its guard) out of the goroutine.Determining
targetIPinside the goroutine forces reliance onGinkgoRecoverto surface the "neither IPv4 nor IPv6 configured" failure, and thereturnat line 1387 is unreachable becauseFail(...)panics. In practice the default branch is also effectively dead — the precedingverifyConnectivity()call at line 1343 would have already asserted at least one server IP via lines 1049‑1050 — so this guard only obscures control flow.Resolving
targetIPbefore launching the goroutine makes the failure mode explicit on the main test goroutine and removes the deadreturn.♻️ Suggested refactor
+ // Determine target IP based on configuration priority: IPv4 first, then IPv6 + var targetIP string + + switch { + case RDSCoreConfig.PodLevelBondDeploymentTwoIPv4 != "": + targetIP = RDSCoreConfig.PodLevelBondDeploymentTwoIPv4 + case RDSCoreConfig.PodLevelBondDeploymentTwoIPv6 != "": + targetIP = RDSCoreConfig.PodLevelBondDeploymentTwoIPv6 + default: + Fail("Neither IPv4 nor IPv6 server address is configured for the server deployment") + } + go func() { defer GinkgoRecover() - // Determine target IP based on configuration priority: IPv4 first, then IPv6 - var targetIP string - - switch { - case RDSCoreConfig.PodLevelBondDeploymentTwoIPv4 != "": - targetIP = RDSCoreConfig.PodLevelBondDeploymentTwoIPv4 - case RDSCoreConfig.PodLevelBondDeploymentTwoIPv6 != "": - targetIP = RDSCoreConfig.PodLevelBondDeploymentTwoIPv6 - default: - Fail("Neither IPv4 nor IPv6 server address is configured for the server deployment") - - return - } - By(fmt.Sprintf("Send data from the client container to the server address %s", targetIP)) klog.V(100).Infof("Using server address %s for TCP traffic generation", targetIP)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/system-tests/rdscore/internal/rdscorecommon/sriov-pod-level-bond.go` around lines 1373 - 1412, The goroutine currently selects targetIP and guards with Fail/return inside the anonymous func (which relies on GinkgoRecover and contains a dead return), so move the selection and guard for targetIP out of the go func and perform it on the main test goroutine before launching the goroutine; compute targetIP using the same switch (checking RDSCoreConfig.PodLevelBondDeploymentTwoIPv4 then PodLevelBondDeploymentTwoIPv6), call Fail(...) on the main goroutine if neither is set, then pass the resolved targetIP into the goroutine and keep the rest of the logic (including calls to generateTCPTraffic and scanClientPodTrafficOutput) unchanged.tests/system-tests/rdscore/internal/rdscorecommon/whereabouts-statefulset.go (2)
245-246: Regexes run against raw JSON — note false-positive surface.Both patterns are applied to the entire
spec.configstring (JSON), not only toipRanges/subnetvalues. The IPv4 pattern\d+\.\d+\.\d+\.\d+/\d+could match any dotted-quad-with-slash that happens to appear elsewhere in the config (e.g., a route string, a comment-like field, or a gateway entry embedded in another plugin section). For a more robust detection, consider unmarshalling the config JSON and inspectingipam.ipRanges[*].range/ipam.rangedirectly. Low priority given current NAD shapes are controlled, but worth tightening if NADs diversify.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/system-tests/rdscore/internal/rdscorecommon/whereabouts-statefulset.go` around lines 245 - 246, The current detection uses regexes on the raw spec.config string (variables hasIPv4/hasIPv6) which can yield false positives; instead unmarshal spec.config JSON and inspect the IPAM fields directly (check ipam.ipRanges[*].range and ipam.range/subnet entries) to determine presence of IPv4/IPv6 ranges, then set hasIPv4/hasIPv6 accordingly; update the code around the MatchString usage to parse with encoding/json, traverse to the ipam structure, and only apply CIDR detection to those specific fields.
245-258: Consider failing fast when no IP family is detected.Falling through to IPv4 as the default when neither regex matches can silently mask a malformed/unexpected NAD
spec.config(e.g., a NAD that doesn't embed CIDRs in the top-level config string, such as when ipRanges live under a different key/structure), leading to a mis-configured Service downstream. Consider returning an error/Failwhen bothhasIPv4andhasIPv6are false so the test surfaces the misconfiguration instead of defaulting silently.Proposed change
switch { case hasIPv4 && hasIPv6: return []corev1.IPFamily{corev1.IPv4Protocol, corev1.IPv6Protocol}, corev1.IPFamilyPolicyRequireDualStack case hasIPv6: return []corev1.IPFamily{corev1.IPv6Protocol}, corev1.IPFamilyPolicySingleStack - default: + case hasIPv4: return []corev1.IPFamily{corev1.IPv4Protocol}, corev1.IPFamilyPolicySingleStack + default: + Fail(fmt.Sprintf("NAD %q spec.config contains no detectable IPv4 or IPv6 CIDR", nadName)) + return nil, "" }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/system-tests/rdscore/internal/rdscorecommon/whereabouts-statefulset.go` around lines 245 - 258, The code currently defaults to IPv4 when neither hasIPv4 nor hasIPv6 is true, which can hide malformed NAD config; change the logic so when both hasIPv4 and hasIPv6 are false the test fails fast instead of returning IPv4: update the switch/default branch that checks hasIPv4/hasIPv6 to return an error or call test fail (e.g., t.Fatalf or require.Fail) with a clear message including nadName and config, or change the function signature to return ([], policy, error) and propagate the error—modify the branches around the hasIPv4/hasIPv6 checks and the default case accordingly.tests/system-tests/rdscore/internal/rdscorecommon/hard-reboot.go (1)
301-302: Include node name in the failure message.When this assertion fails across many nodes in a suite run,
"Node hasn't reached notReady state and boot ID hasn't changed"makes triage harder than it needs to be. Interpolate the node name (as the Ready-wait message should as well).📝 Suggested tweak
}).WithTimeout(25*time.Minute).WithPolling(15*time.Second).WithContext(ctx).Should(BeTrue(), - "Node hasn't reached notReady state and boot ID hasn't changed") + fmt.Sprintf("Node %q hasn't reached notReady state and boot ID hasn't changed", + _node.Definition.Name))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/system-tests/rdscore/internal/rdscorecommon/hard-reboot.go` around lines 301 - 302, The assertion call that ends with .WithTimeout(...).WithPolling(...).WithContext(ctx).Should(BeTrue(), "Node hasn't reached notReady state and boot ID hasn't changed") should include the node identifier to aid triage; update the failure message passed to Should(BeTrue, ...) to interpolate the node name (e.g. use node.Name or nodeName) so it reads something like "Node <nodeName> hasn't reached notReady state and boot ID hasn't changed"; keep the change localized to the Should(...) call used alongside WithTimeout/WithPolling/WithContext so the Ready-wait messages remain consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/system-tests/rdscore/internal/rdscorecommon/hard-reboot.go`:
- Around line 304-335: The current fast-reboot branch skips the Ready wait
(nodeWentNotReady false path) and only sleeps 15s before uncordoning; change so
we always wait for the node to reach Ready before uncordoning. Replace the else
branch (the 15s sleep) with the same Eventually readiness check used in the
nodeWentNotReady true branch (or extract that check into a helper like
waitForNodeReady(name, ctx) and call it from both branches), referencing
currentNode.Object.Status.Conditions and rdscoreparams.ConditionTypeReadyString
/ ConstantTrueString to detect readiness; keep the existing logging around
nodeWentNotReady for informational purposes if desired.
---
Nitpick comments:
In `@tests/system-tests/rdscore/internal/rdscorecommon/hard-reboot.go`:
- Around line 301-302: The assertion call that ends with
.WithTimeout(...).WithPolling(...).WithContext(ctx).Should(BeTrue(), "Node
hasn't reached notReady state and boot ID hasn't changed") should include the
node identifier to aid triage; update the failure message passed to
Should(BeTrue, ...) to interpolate the node name (e.g. use node.Name or
nodeName) so it reads something like "Node <nodeName> hasn't reached notReady
state and boot ID hasn't changed"; keep the change localized to the Should(...)
call used alongside WithTimeout/WithPolling/WithContext so the Ready-wait
messages remain consistent.
In `@tests/system-tests/rdscore/internal/rdscorecommon/sriov-pod-level-bond.go`:
- Around line 1373-1412: The goroutine currently selects targetIP and guards
with Fail/return inside the anonymous func (which relies on GinkgoRecover and
contains a dead return), so move the selection and guard for targetIP out of the
go func and perform it on the main test goroutine before launching the
goroutine; compute targetIP using the same switch (checking
RDSCoreConfig.PodLevelBondDeploymentTwoIPv4 then PodLevelBondDeploymentTwoIPv6),
call Fail(...) on the main goroutine if neither is set, then pass the resolved
targetIP into the goroutine and keep the rest of the logic (including calls to
generateTCPTraffic and scanClientPodTrafficOutput) unchanged.
In
`@tests/system-tests/rdscore/internal/rdscorecommon/whereabouts-statefulset.go`:
- Around line 245-246: The current detection uses regexes on the raw spec.config
string (variables hasIPv4/hasIPv6) which can yield false positives; instead
unmarshal spec.config JSON and inspect the IPAM fields directly (check
ipam.ipRanges[*].range and ipam.range/subnet entries) to determine presence of
IPv4/IPv6 ranges, then set hasIPv4/hasIPv6 accordingly; update the code around
the MatchString usage to parse with encoding/json, traverse to the ipam
structure, and only apply CIDR detection to those specific fields.
- Around line 245-258: The code currently defaults to IPv4 when neither hasIPv4
nor hasIPv6 is true, which can hide malformed NAD config; change the logic so
when both hasIPv4 and hasIPv6 are false the test fails fast instead of returning
IPv4: update the switch/default branch that checks hasIPv4/hasIPv6 to return an
error or call test fail (e.g., t.Fatalf or require.Fail) with a clear message
including nadName and config, or change the function signature to return ([],
policy, error) and propagate the error—modify the branches around the
hasIPv4/hasIPv6 checks and the default case accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8100d46d-b8a5-4346-bca7-0075d3922202
📒 Files selected for processing (6)
tests/system-tests/rdscore/internal/rdscorecommon/egress-service.gotests/system-tests/rdscore/internal/rdscorecommon/hard-reboot.gotests/system-tests/rdscore/internal/rdscorecommon/ipvlan-validation.gotests/system-tests/rdscore/internal/rdscorecommon/monitoring-config-validation.gotests/system-tests/rdscore/internal/rdscorecommon/sriov-pod-level-bond.gotests/system-tests/rdscore/internal/rdscorecommon/whereabouts-statefulset.go
✅ Files skipped from review due to trivial changes (1)
- tests/system-tests/rdscore/internal/rdscorecommon/egress-service.go
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/system-tests/rdscore/internal/rdscorecommon/monitoring-config-validation.go
- tests/system-tests/rdscore/internal/rdscorecommon/ipvlan-validation.go
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/system-tests/rdscore/internal/rdscorecommon/nmi-redfish.go (1)
113-142: Consider tightening the post-BootID-change Ready wait.Once
waitForBootIDChangereturns (up to 25 min), the node has demonstrably rebooted, so the subsequentEventuallyfor Ready state (lines 120-142) effectively only needs to cover kubelet re-registration time. The current 25-minute timeout on both stages yields a worst-case wall time of ~50 minutes per node, which may mask real regressions and slow CI. A shorter Ready timeout (e.g., 10 minutes) after BootID change would be more diagnostic without sacrificing reliability.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/system-tests/rdscore/internal/rdscorecommon/nmi-redfish.go` around lines 113 - 142, After waitForBootIDChange(...) returns, shorten the subsequent Eventually timeout that waits for the node Ready condition from 25*time.Minute to a smaller value (suggest 10*time.Minute) so we only cover kubelet re-registration; update the WithTimeout(...) call on the Eventually block that checks currentNode.Object.Status.Conditions (the block referencing node.Definition.Name, rdscoreparams.ConditionTypeReadyString and rdscoreparams.ConstantTrueString) to use 10*time.Minute (or a configurable constant) instead of 25*time.Minute.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/system-tests/rdscore/internal/rdscorecommon/nmi-redfish.go`:
- Around line 113-142: After waitForBootIDChange(...) returns, shorten the
subsequent Eventually timeout that waits for the node Ready condition from
25*time.Minute to a smaller value (suggest 10*time.Minute) so we only cover
kubelet re-registration; update the WithTimeout(...) call on the Eventually
block that checks currentNode.Object.Status.Conditions (the block referencing
node.Definition.Name, rdscoreparams.ConditionTypeReadyString and
rdscoreparams.ConstantTrueString) to use 10*time.Minute (or a configurable
constant) instead of 25*time.Minute.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bc5c7789-a6d7-4437-96ba-168d059aef08
📒 Files selected for processing (2)
tests/system-tests/rdscore/internal/rdscorecommon/hard-reboot.gotests/system-tests/rdscore/internal/rdscorecommon/nmi-redfish.go
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/system-tests/rdscore/internal/rdscorecommon/hard-reboot.go
yprokule
left a comment
There was a problem hiding this comment.
Please split this into multiple PRs
Summary by CodeRabbit