rhwa nhc: add NHC & SNR sudden-loss system test#1272
rhwa nhc: add NHC & SNR sudden-loss system test#1272rdiscala wants to merge 1 commit intorh-ecosystem-edge:mainfrom
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdd NHC/SNR test infrastructure: explicit YAML defaults and BMC config, new BMCDetails type and RHWAConfig fields, operator constants and timeouts, Ginkgo suite entrypoints, reporter config, and new NHC/SNR e2e tests including a BMC-driven "sudden node loss" scenario. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Tested on kni-qe-3. An example wrapper script to run it would be: |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/rhwa/internal/rhwaconfig/rhwaconfig.go`:
- Around line 29-31: The Decode method on BMCDetails should defensively handle
empty or whitespace-only input before calling json.Unmarshal: in
BMCDetails.Decode(value string) check if strings.TrimSpace(value) == "" and if
so return nil (no-op) to avoid json EOF errors from envconfig; otherwise proceed
to json.Unmarshal into b and return that error. Ensure you import/qualify
strings if needed and do not change the existing unmarshalling behavior for
non-empty inputs.
In `@tests/rhwa/nhc-operator/tests/sudden-node-loss.go`:
- Around line 449-451: The code uses an unchecked type assertion when reading
snrResource.Object["spec"] into snrSpec (snrSpec :=
snrResource.Object["spec"].(map[string]any)), which can panic; change this to a
guarded assertion (e.g., spec, ok :=
snrResource.Object["spec"].(map[string]any)) and assert the ok value before
using snrSpec (use Expect(ok).To(BeTrue(), "expected spec to be map[string]any")
or similar) so that the subsequent
Expect(snrSpec["remediationStrategy"]).To(Equal("OutOfServiceTaint"), ...) is
only executed when the spec has the correct type.
- Around line 331-352: The code performs unchecked type assertions on
nhcResource.Object["spec"] and ["status"] and ignores errors from toFloat64,
which can cause panics or silent failures; update the block to validate
existence and types before asserting (check nhcResource.Object has "spec" and
"status" and that they cast to map[string]any), handle the case where minHealthy
is missing, and when using toFloat64 on observedNodes and healthyNodes capture
and handle the returned errors (fail the test with a clear message if conversion
fails) so that minHealthyStr parsing and the Expect comparison only run on
validated values.
- Around line 176-214: The test can schedule the pod on a failover worker
because
deployment.NewBuilder(...).WithNodeSelector(map[string]string{nhcparams.AppWorkerLabel:
""}) allows any worker, while the BMC client is created for
RHWAConfig.TargetWorkerBMC.Address; update the deployment to pin the pod to the
specific target worker (e.g., set WithNodeSelector to
map[string]string{nhcparams.AppWorkerLabel: RHWAConfig.TargetWorker} or add a
NodeName/NodeAffinity using the target) or, after pod.List and assigning
targetNode from appPods[0].Object.Spec.NodeName, assert/guard that targetNode ==
RHWAConfig.TargetWorker and call Skip(...) if it differs before creating
bmc.New(...), so the BMC address and powered-off node always match the pod
location.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a0e46c39-eef7-4c39-ab7b-db642b400112
📒 Files selected for processing (10)
tests/rhwa/internal/rhwaconfig/default.yamltests/rhwa/internal/rhwaconfig/rhwaconfig.gotests/rhwa/nhc-operator/internal/nhcparams/const.gotests/rhwa/nhc-operator/internal/nhcparams/nhcvars.gotests/rhwa/nhc-operator/nhc_suite_test.gotests/rhwa/nhc-operator/tests/nhc.gotests/rhwa/nhc-operator/tests/sudden-node-loss.gotests/rhwa/snr-operator/internal/snrparams/snrvars.gotests/rhwa/snr-operator/snr_suite_test.gotests/rhwa/snr-operator/tests/snr.go
5cf83a0 to
741384b
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (2)
tests/rhwa/nhc-operator/tests/sudden-node-loss.go (2)
343-354:⚠️ Potential issue | 🟠 MajorHandle both conversion failures and the non-
%threshold path.This block still ignores the
toFloat64errors, and it silently skips the threshold check wheneverminHealthyis not stored as a percentage string. That weakens the precondition exactly where the unstructured payload changes type.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/rhwa/nhc-operator/tests/sudden-node-loss.go` around lines 343 - 354, The minHealthy check currently ignores toFloat64 errors and skips the non-% branch; update the block around strings.CutSuffix(minHealthyStr, "%") and the uses of toFloat64 to validate conversions: call toFloat64(observedNodes) and toFloat64(healthyNodes) and assert no error via Expect(parseErr). If strings.CutSuffix returns ok, parse pct with strconv.ParseFloat as existing, compute minRequired := math.Ceil(observed * pct / 100) and assert healthy >= minRequired; else attempt to convert minHealthyStr to a numeric threshold by calling toFloat64(minHealthyStr) (or strconv.ParseFloat) and Expect no error, then assert healthy >= minRequiredAbs. Ensure each conversion failure uses Expect(...).ToNot(HaveOccurred(), ...) with a clear message so neither conversion nor the non-% path is silently ignored.
176-178:⚠️ Potential issue | 🔴 CriticalKeep the BMC target and the watched node aligned.
Line 205 rewrites
targetNodeto wherever the pod actually landed, while Lines 210-213 still build the BMC client fromRHWAConfig.TargetWorkerBMC. If the pod starts on a failover worker, Step 3 becomes self-fulfilling, Step 4 powers off one node and watches another, and recovery/cleanup target the wrong machine.🔧 Minimal guard
targetNode = appPods[0].Object.Spec.NodeName + if targetNode != RHWAConfig.TargetWorker { + Skip(fmt.Sprintf( + "stateful app landed on %s, expected %s for the configured BMC target", + targetNode, RHWAConfig.TargetWorker, + )) + } klog.Infof("Stateful app is running on node %s", targetNode)Also applies to: 205-213, 274-289
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/rhwa/nhc-operator/tests/sudden-node-loss.go` around lines 176 - 178, The BMC client is being created from RHWAConfig.TargetWorkerBMC while targetNode may be rewritten after pod scheduling, causing the watched node and the BMC target to diverge; update the code that builds the BMC client to use the BMC entry that corresponds to the final targetNode (e.g., look up RHWAConfig.TargetWorkerBMC[targetNode.Name] or overwrite RHWAConfig.TargetWorkerBMC with the BMC spec for targetNode) before creating the client (affecting the code that constructs the BMC client and any uses around targetNode, RHWAConfig.TargetWorkerBMC, and the deployment.NewBuilder flow), and apply the same fix in the other occurrences mentioned (around the blocks at 205-213 and 274-289) so the powered-off machine and watched node always match.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/rhwa/nhc-operator/tests/sudden-node-loss.go`:
- Around line 107-124: The cleanup currently removes nhcparams.AppWorkerLabel
from every worker unconditionally; change the logic to record which nodes the
suite actually modified and only remove the label from those. In the setup loop
around allWorkers/workerNode (using nodes.Pull, WithNewLabel, Update) capture
the original label state (existence and value) or append workerName to a slice
like labeledWorkers when you call WithNewLabel/Update, and then in the cleanup
(the block corresponding to lines 254-270) iterate only over that recorded slice
to remove the label; if you stored the previous value, restore it instead of
deleting to avoid touching labels the suite did not create. Ensure references
remain to nhcparams.AppWorkerLabel, allWorkers, and the Update() call sites so
the change is easy to locate.
- Around line 166-174: The test advances as soon as the replacement pod is
PodRunning which can be before the PVC is usable; change the wait logic to wait
for the pod to be Ready (respecting the containerSpec.ReadinessProbe on
/data/heartbeat.log) instead of just Running. Locate the code that waits for the
replacement pod status (the helper currently checking for PodRunning) and
replace it with a wait for the PodReady condition (e.g., using the k8s Pod
condition "Ready" true or kubectl/wait-style polling), or call the existing
helper that checks readiness if present, so the test only proceeds when
readiness (not just running) is observed. Ensure the replacement pod check
references the same pod name/labels used when creating the replacement pod and
times out consistently with other waits.
- Around line 604-622: The check for old-node attachment sets foundOnOldNode for
any VolumeAttachment referencing the old node even if detached; update the loop
over vaList.Items (inspect va := &vaList.Items[idx] and
va.Spec.Source.PersistentVolumeName) so that foundOnOldNode is only set when
va.Spec.NodeName == targetNode AND va.Status.Attached is true, leaving the
foundOnNewNode logic unchanged, so the subsequent Expect(...) that asserts
foundOnOldNode is false only fails when the volume is actually still attached to
the old node.
- Around line 57-73: The preflight block that checks RHWAConfig fields must also
validate that RHWAConfig.FailoverWorkers is non-empty and that
RHWAConfig.TargetWorkerBMC.Username and RHWAConfig.TargetWorkerBMC.Password are
provided; add Skip() guards alongside the existing checks (the same section
around the By("Checking required configuration is provided") checks) to bail out
early if FailoverWorkers has length 0 or if
TargetWorkerBMC.Username/TargetWorkerBMC.Password are empty so the suite never
proceeds with missing inputs that would cause mid-test failures.
---
Duplicate comments:
In `@tests/rhwa/nhc-operator/tests/sudden-node-loss.go`:
- Around line 343-354: The minHealthy check currently ignores toFloat64 errors
and skips the non-% branch; update the block around
strings.CutSuffix(minHealthyStr, "%") and the uses of toFloat64 to validate
conversions: call toFloat64(observedNodes) and toFloat64(healthyNodes) and
assert no error via Expect(parseErr). If strings.CutSuffix returns ok, parse pct
with strconv.ParseFloat as existing, compute minRequired := math.Ceil(observed *
pct / 100) and assert healthy >= minRequired; else attempt to convert
minHealthyStr to a numeric threshold by calling toFloat64(minHealthyStr) (or
strconv.ParseFloat) and Expect no error, then assert healthy >= minRequiredAbs.
Ensure each conversion failure uses Expect(...).ToNot(HaveOccurred(), ...) with
a clear message so neither conversion nor the non-% path is silently ignored.
- Around line 176-178: The BMC client is being created from
RHWAConfig.TargetWorkerBMC while targetNode may be rewritten after pod
scheduling, causing the watched node and the BMC target to diverge; update the
code that builds the BMC client to use the BMC entry that corresponds to the
final targetNode (e.g., look up RHWAConfig.TargetWorkerBMC[targetNode.Name] or
overwrite RHWAConfig.TargetWorkerBMC with the BMC spec for targetNode) before
creating the client (affecting the code that constructs the BMC client and any
uses around targetNode, RHWAConfig.TargetWorkerBMC, and the
deployment.NewBuilder flow), and apply the same fix in the other occurrences
mentioned (around the blocks at 205-213 and 274-289) so the powered-off machine
and watched node always match.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4bc6dfb0-91b8-45c9-91f1-9b90185ffe8c
📒 Files selected for processing (10)
tests/rhwa/internal/rhwaconfig/default.yamltests/rhwa/internal/rhwaconfig/rhwaconfig.gotests/rhwa/nhc-operator/internal/nhcparams/const.gotests/rhwa/nhc-operator/internal/nhcparams/nhcvars.gotests/rhwa/nhc-operator/nhc_suite_test.gotests/rhwa/nhc-operator/tests/nhc.gotests/rhwa/nhc-operator/tests/sudden-node-loss.gotests/rhwa/snr-operator/internal/snrparams/snrvars.gotests/rhwa/snr-operator/snr_suite_test.gotests/rhwa/snr-operator/tests/snr.go
✅ Files skipped from review due to trivial changes (2)
- tests/rhwa/internal/rhwaconfig/default.yaml
- tests/rhwa/nhc-operator/internal/nhcparams/const.go
🚧 Files skipped from review as they are similar to previous changes (5)
- tests/rhwa/nhc-operator/nhc_suite_test.go
- tests/rhwa/nhc-operator/tests/nhc.go
- tests/rhwa/snr-operator/tests/snr.go
- tests/rhwa/snr-operator/internal/snrparams/snrvars.go
- tests/rhwa/internal/rhwaconfig/rhwaconfig.go
741384b to
08d727b
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (4)
tests/rhwa/nhc-operator/tests/sudden-node-loss.go (4)
619-621:⚠️ Potential issue | 🟡 MinorOnly flag old-node attachment when it is still attached.
foundOnOldNodeis set for anyVolumeAttachmentreferencing the old node, even ifva.Status.Attachedis false. CSI controllers may briefly leave a stale object withattached=false, causing a false failure after a healthy failover.💽 Proposed fix
- if va.Spec.NodeName == targetNode { + if va.Spec.NodeName == targetNode && va.Status.Attached { foundOnOldNode = true }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/rhwa/nhc-operator/tests/sudden-node-loss.go` around lines 619 - 621, The test currently sets foundOnOldNode whenever a VolumeAttachment's va.Spec.NodeName equals targetNode; change the check to only set foundOnOldNode when the VolumeAttachment is still attached by also verifying va.Status.Attached is true (i.e., require both va.Spec.NodeName == targetNode and va.Status.Attached == true) so stale/detached attachments on the old node are ignored; update the logic where foundOnOldNode is assigned and any related comments or variable names in the sudden-node-loss test to reflect this stricter condition.
56-73:⚠️ Potential issue | 🟡 MinorValidate remaining required inputs before starting the suite.
The configuration validation is missing checks for
FailoverWorkers(needed for pod rescheduling), andTargetWorkerBMC.Username/Password(used when creating BMC client). Missing these checks turns configuration mistakes into mid-test failures.⚙️ Suggested guardrails
if RHWAConfig.TargetWorker == "" { Skip("ECO_RHWA_NHC_TARGET_WORKER not set") } + + if len(RHWAConfig.FailoverWorkers) == 0 { + Skip("ECO_RHWA_NHC_FAILOVER_WORKERS not set") + } if RHWAConfig.StorageClass == "" { Skip("ECO_RHWA_NHC_STORAGE_CLASS not set") } if RHWAConfig.AppImage == "" { Skip("ECO_RHWA_NHC_APP_IMAGE not set") } if RHWAConfig.TargetWorkerBMC.Address == "" { Skip("ECO_RHWA_NHC_TARGET_WORKER_BMC address not set") } + + if RHWAConfig.TargetWorkerBMC.Username == "" || RHWAConfig.TargetWorkerBMC.Password == "" { + Skip("ECO_RHWA_NHC_TARGET_WORKER_BMC credentials not set") + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/rhwa/nhc-operator/tests/sudden-node-loss.go` around lines 56 - 73, The BeforeAll validation currently checks only TargetWorker, StorageClass, AppImage and TargetWorkerBMC.Address; add checks for RHWAConfig.FailoverWorkers (ensure it's non-empty) and for RHWAConfig.TargetWorkerBMC.Username and RHWAConfig.TargetWorkerBMC.Password (ensure both are non-empty) inside the same BeforeAll block so missing config causes Skip with clear messages (e.g., "ECO_RHWA_NHC_FAILOVER_WORKERS not set", "ECO_RHWA_NHC_TARGET_WORKER_BMC_USERNAME not set", "ECO_RHWA_NHC_TARGET_WORKER_BMC_PASSWORD not set"), keeping the validation logic consistent with existing checks for RHWAConfig and TargetWorkerBMC fields.
352-353:⚠️ Potential issue | 🟡 MinorHandle
toFloat64conversion errors.The errors from
toFloat64are discarded. If conversion fails,observedandhealthywill be 0, potentially causing incorrect threshold validation or misleading pass/fail results.🛡️ Proposed fix
- observed, _ := toFloat64(observedNodes) - healthy, _ := toFloat64(healthyNodes) + observed, obsErr := toFloat64(observedNodes) + Expect(obsErr).ToNot(HaveOccurred(), "Failed to convert observedNodes to float64") + healthy, healthyErr := toFloat64(healthyNodes) + Expect(healthyErr).ToNot(HaveOccurred(), "Failed to convert healthyNodes to float64") minRequired := math.Ceil(observed * pct / 100)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/rhwa/nhc-operator/tests/sudden-node-loss.go` around lines 352 - 353, The code currently ignores errors returned by toFloat64 when converting observedNodes and healthyNodes; update the conversion calls to check the returned errors and handle them (e.g., fail the test or return an error) instead of letting zero values proceed—replace the unchecked conversions for observed, healthy from toFloat64(observedNodes) and toFloat64(healthyNodes) with error-aware logic that asserts or t.Fatalf (or returns) including the conversion error and context ("failed to convert observedNodes" / "failed to convert healthyNodes") so any conversion failure surfaces immediately and prevents incorrect threshold evaluation.
259-276:⚠️ Potential issue | 🟡 MinorCleanup removes node labels the suite may not have created.
Setup skips labeling nodes that already have
AppWorkerLabel(lines 113-117), but cleanup unconditionally removes the label from all workers. This can strip shared-cluster state the suite never owned.Consider tracking which nodes the suite actually labeled and only removing from those.
🔧 Suggested approach
+ // Track nodes we actually labeled + var labeledByTest []string + BeforeAll(func() { // ... existing code ... for _, workerName := range allWorkers { // ... if _, exists := workerNode.Object.Labels[nhcparams.AppWorkerLabel]; exists { klog.Infof("Node %s already has label %s, skipping", workerName, nhcparams.AppWorkerLabel) continue } workerNode = workerNode.WithNewLabel(nhcparams.AppWorkerLabel, "") _, updateErr := workerNode.Update() Expect(updateErr).ToNot(HaveOccurred(), ...) + labeledByTest = append(labeledByTest, workerName) } // ... }) AfterAll(func() { // ... By("Removing appworker label from nodes") - allWorkers := append([]string{targetNode}, RHWAConfig.FailoverWorkers...) - for _, workerName := range allWorkers { + for _, workerName := range labeledByTest { // ... remove label ... } })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/rhwa/nhc-operator/tests/sudden-node-loss.go` around lines 259 - 276, The cleanup loop indiscriminately removes nhcparams.AppWorkerLabel from allWorkers (built from targetNode and RHWAConfig.FailoverWorkers) even if the suite didn’t create that label; modify the code to record which nodes the suite actually labeled during setup (e.g., append node names to a labeledWorkers slice when you successfully call workerNode.AddLabel or similar) and then in this teardown use that labeledWorkers list instead of allWorkers and call nodes.Pull + workerNode.RemoveLabel + workerNode.Update only for entries in labeledWorkers; ensure the tracking variable is accessible to both setup and teardown (suite-scoped or attached to RHWAConfig) and skip removal for nodes not in that tracked list.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@tests/rhwa/nhc-operator/tests/sudden-node-loss.go`:
- Around line 619-621: The test currently sets foundOnOldNode whenever a
VolumeAttachment's va.Spec.NodeName equals targetNode; change the check to only
set foundOnOldNode when the VolumeAttachment is still attached by also verifying
va.Status.Attached is true (i.e., require both va.Spec.NodeName == targetNode
and va.Status.Attached == true) so stale/detached attachments on the old node
are ignored; update the logic where foundOnOldNode is assigned and any related
comments or variable names in the sudden-node-loss test to reflect this stricter
condition.
- Around line 56-73: The BeforeAll validation currently checks only
TargetWorker, StorageClass, AppImage and TargetWorkerBMC.Address; add checks for
RHWAConfig.FailoverWorkers (ensure it's non-empty) and for
RHWAConfig.TargetWorkerBMC.Username and RHWAConfig.TargetWorkerBMC.Password
(ensure both are non-empty) inside the same BeforeAll block so missing config
causes Skip with clear messages (e.g., "ECO_RHWA_NHC_FAILOVER_WORKERS not set",
"ECO_RHWA_NHC_TARGET_WORKER_BMC_USERNAME not set",
"ECO_RHWA_NHC_TARGET_WORKER_BMC_PASSWORD not set"), keeping the validation logic
consistent with existing checks for RHWAConfig and TargetWorkerBMC fields.
- Around line 352-353: The code currently ignores errors returned by toFloat64
when converting observedNodes and healthyNodes; update the conversion calls to
check the returned errors and handle them (e.g., fail the test or return an
error) instead of letting zero values proceed—replace the unchecked conversions
for observed, healthy from toFloat64(observedNodes) and toFloat64(healthyNodes)
with error-aware logic that asserts or t.Fatalf (or returns) including the
conversion error and context ("failed to convert observedNodes" / "failed to
convert healthyNodes") so any conversion failure surfaces immediately and
prevents incorrect threshold evaluation.
- Around line 259-276: The cleanup loop indiscriminately removes
nhcparams.AppWorkerLabel from allWorkers (built from targetNode and
RHWAConfig.FailoverWorkers) even if the suite didn’t create that label; modify
the code to record which nodes the suite actually labeled during setup (e.g.,
append node names to a labeledWorkers slice when you successfully call
workerNode.AddLabel or similar) and then in this teardown use that
labeledWorkers list instead of allWorkers and call nodes.Pull +
workerNode.RemoveLabel + workerNode.Update only for entries in labeledWorkers;
ensure the tracking variable is accessible to both setup and teardown
(suite-scoped or attached to RHWAConfig) and skip removal for nodes not in that
tracked list.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0cfb0112-8602-4389-8a74-deac551982f9
📒 Files selected for processing (10)
tests/rhwa/internal/rhwaconfig/default.yamltests/rhwa/internal/rhwaconfig/rhwaconfig.gotests/rhwa/nhc-operator/internal/nhcparams/const.gotests/rhwa/nhc-operator/internal/nhcparams/nhcvars.gotests/rhwa/nhc-operator/nhc_suite_test.gotests/rhwa/nhc-operator/tests/nhc.gotests/rhwa/nhc-operator/tests/sudden-node-loss.gotests/rhwa/snr-operator/internal/snrparams/snrvars.gotests/rhwa/snr-operator/snr_suite_test.gotests/rhwa/snr-operator/tests/snr.go
✅ Files skipped from review due to trivial changes (2)
- tests/rhwa/internal/rhwaconfig/default.yaml
- tests/rhwa/nhc-operator/internal/nhcparams/const.go
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/rhwa/snr-operator/tests/snr.go
- tests/rhwa/nhc-operator/tests/nhc.go
- tests/rhwa/nhc-operator/internal/nhcparams/nhcvars.go
1100fe0 to
92c89ca
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tests/rhwa/nhc-operator/tests/sudden-node-loss.go (1)
358-367:⚠️ Potential issue | 🟡 MinorDon't swallow
toFloat64failures in theminHealthycheck.Lines 362-363 still ignore conversion errors, so an unexpected
observedNodesorhealthyNodestype becomes0and this assertion can evaluate the wrong threshold. Fail fast if either conversion fails.🔧 Suggested fix
- observed, _ := toFloat64(observedNodes) - healthy, _ := toFloat64(healthyNodes) + observed, obsErr := toFloat64(observedNodes) + Expect(obsErr).ToNot(HaveOccurred(), "Failed to convert observedNodes") + healthy, healthyErr := toFloat64(healthyNodes) + Expect(healthyErr).ToNot(HaveOccurred(), "Failed to convert healthyNodes") minRequired := math.Ceil(observed * pct / 100)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/rhwa/nhc-operator/tests/sudden-node-loss.go` around lines 358 - 367, The conversions for observedNodes and healthyNodes currently ignore errors from toFloat64; change the code to capture both returned values (e.g., observed, obsErr := toFloat64(observedNodes) and healthy, healthyErr := toFloat64(healthyNodes)) and assert they did not fail (use Expect(obsErr).ToNot(HaveOccurred(), ...) and Expect(healthyErr).ToNot(HaveOccurred(), ...)) before computing minRequired and performing the Expect(healthy).To(BeNumerically(...)) check so conversion failures fail fast; keep the existing variables (observed, healthy, minRequired) and error messages descriptive.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/rhwa/nhc-operator/tests/sudden-node-loss.go`:
- Around line 72-74: The check only ensures RHWAConfig.FailoverWorkers is
non-empty but doesn't verify a usable failover target; update the preflight in
the Sudden Node Loss test (and similar guard around the Step 7 region) to
require at least one entry in RHWAConfig.FailoverWorkers that is not equal to
TargetWorker and that is currently healthy/Ready (use the existing readiness
helper or inspect the worker's Ready condition); if no such distinct, healthy
failover worker exists, call Skip("no distinct healthy failover worker
available") so the outage isn't started.
---
Duplicate comments:
In `@tests/rhwa/nhc-operator/tests/sudden-node-loss.go`:
- Around line 358-367: The conversions for observedNodes and healthyNodes
currently ignore errors from toFloat64; change the code to capture both returned
values (e.g., observed, obsErr := toFloat64(observedNodes) and healthy,
healthyErr := toFloat64(healthyNodes)) and assert they did not fail (use
Expect(obsErr).ToNot(HaveOccurred(), ...) and
Expect(healthyErr).ToNot(HaveOccurred(), ...)) before computing minRequired and
performing the Expect(healthy).To(BeNumerically(...)) check so conversion
failures fail fast; keep the existing variables (observed, healthy, minRequired)
and error messages descriptive.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ab840eb3-b3de-46cc-9db3-55133bcd972f
📒 Files selected for processing (10)
tests/rhwa/internal/rhwaconfig/default.yamltests/rhwa/internal/rhwaconfig/rhwaconfig.gotests/rhwa/nhc-operator/internal/nhcparams/const.gotests/rhwa/nhc-operator/internal/nhcparams/nhcvars.gotests/rhwa/nhc-operator/nhc_suite_test.gotests/rhwa/nhc-operator/tests/nhc.gotests/rhwa/nhc-operator/tests/sudden-node-loss.gotests/rhwa/snr-operator/internal/snrparams/snrvars.gotests/rhwa/snr-operator/snr_suite_test.gotests/rhwa/snr-operator/tests/snr.go
✅ Files skipped from review due to trivial changes (5)
- tests/rhwa/internal/rhwaconfig/default.yaml
- tests/rhwa/nhc-operator/tests/nhc.go
- tests/rhwa/snr-operator/tests/snr.go
- tests/rhwa/snr-operator/internal/snrparams/snrvars.go
- tests/rhwa/nhc-operator/internal/nhcparams/const.go
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/rhwa/snr-operator/snr_suite_test.go
- tests/rhwa/nhc-operator/nhc_suite_test.go
- tests/rhwa/nhc-operator/internal/nhcparams/nhcvars.go
8e5c0b2 to
95bd946
Compare
b19bbea to
b77b02a
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
tests/rhwa/nhc-operator/tests/sudden-node-loss.go (2)
80-103:⚠️ Potential issue | 🟠 MajorRequire a schedulable failover worker, not just a Ready one.
Ready=Trueis not sufficient here. A cordoned node still passes this gate, but Kubernetes will not place new pods on an unschedulable node, so Step 7 can still dead-end after the target is powered off. Fold!failoverNode.Object.Spec.Unschedulableinto this preflight check before starting the outage; if you want full coverage, also reject untoleratedNoScheduletaints. (docs.okd.io)⚙️ Minimal guard
failoverNode, pullErr := nodes.Pull(APIClient, failoverName) if pullErr != nil { klog.Warningf("Failed to pull failover node %s: %v", failoverName, pullErr) continue } + + if failoverNode.Object.Spec.Unschedulable { + continue + } for _, condition := range failoverNode.Object.Status.Conditions {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/rhwa/nhc-operator/tests/sudden-node-loss.go` around lines 80 - 103, The check for a healthy failover worker currently only verifies Ready condition via iterating failoverNode.Object.Status.Conditions and matching corev1.NodeReady==corev1.ConditionTrue; update the preflight to also ensure the node is schedulable by checking failoverNode.Object.Spec.Unschedulable is false and optionally that the node does not have untolerated NoSchedule taints (inspect failoverNode.Object.Spec.Taints and ensure either no NoSchedule taint or that RHWA's pod tolerations would match). Modify the loop that processes RHWAConfig.FailoverWorkers (and uses nodes.Pull) to fold these additional checks before setting hasHealthyFailover and breaking.
156-174:⚠️ Potential issue | 🟠 MajorFail fast if another
appworkernode is already eligible for the initial placement.This block only strips
AppWorkerLabelfromRHWAConfig.FailoverWorkers. If some other worker already carries the same empty-string label, the deployment on Lines 240-243 is still eligible to land there and the suite aborts before the outage scenario starts. A cheap guard here is to list all nodes matchingAppWorkerLabel=""andSkip(...)unless the target is the only initial candidate.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/rhwa/nhc-operator/tests/sudden-node-loss.go` around lines 156 - 174, The loop only removes nhcparams.AppWorkerLabel from RHWAConfig.FailoverWorkers but does not guard against other nodes already having AppWorkerLabel="" which can accept the initial placement; before removing labels, list nodes using APIClient.K8sClient.CoreV1().Nodes().List with a label selector for nhcparams.AppWorkerLabel="" and if the returned candidate list length is >1 and the current target (workerName from RHWAConfig.FailoverWorkers via nodes.Pull) is not the sole candidate, call Skip(...) to fail fast; update the logic around nodes.Pull / the unlabeledWorkers append to perform this check and only proceed to Patch (APIClient.K8sClient.CoreV1().Nodes().Patch) when the target is the only initial eligible node.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/rhwa/nhc-operator/README.md`:
- Around line 118-126: The fenced code block in README.md that contains the
commented export lines and "make run-tests" is missing a language tag; edit that
block by adding "bash" immediately after the opening triple backticks (so the
block starts with ```bash) to satisfy MD040 and enable shell syntax highlighting
for the exported variable sample.
In `@tests/rhwa/snr-operator/tests/snr.go`:
- Around line 38-44: List pods matching listOptions first using the existing
pod-listing helper (call pod.ListPodsInNamespace or the equivalent helper with
APIClient, rhwaparams.RhwaOperatorNs and listOptions), assert the returned pod
list is non-empty (fail the test if zero pods matched, pointing at
rhwaparams.OperatorControllerPodLabel), then call
pod.WaitForAllPodsInNamespaceRunning (or use its boolean return) to wait for
readiness; do not discard the boolean result so a misconfigured
OperatorControllerPodLabel cannot silently pass.
---
Duplicate comments:
In `@tests/rhwa/nhc-operator/tests/sudden-node-loss.go`:
- Around line 80-103: The check for a healthy failover worker currently only
verifies Ready condition via iterating failoverNode.Object.Status.Conditions and
matching corev1.NodeReady==corev1.ConditionTrue; update the preflight to also
ensure the node is schedulable by checking
failoverNode.Object.Spec.Unschedulable is false and optionally that the node
does not have untolerated NoSchedule taints (inspect
failoverNode.Object.Spec.Taints and ensure either no NoSchedule taint or that
RHWA's pod tolerations would match). Modify the loop that processes
RHWAConfig.FailoverWorkers (and uses nodes.Pull) to fold these additional checks
before setting hasHealthyFailover and breaking.
- Around line 156-174: The loop only removes nhcparams.AppWorkerLabel from
RHWAConfig.FailoverWorkers but does not guard against other nodes already having
AppWorkerLabel="" which can accept the initial placement; before removing
labels, list nodes using APIClient.K8sClient.CoreV1().Nodes().List with a label
selector for nhcparams.AppWorkerLabel="" and if the returned candidate list
length is >1 and the current target (workerName from RHWAConfig.FailoverWorkers
via nodes.Pull) is not the sole candidate, call Skip(...) to fail fast; update
the logic around nodes.Pull / the unlabeledWorkers append to perform this check
and only proceed to Patch (APIClient.K8sClient.CoreV1().Nodes().Patch) when the
target is the only initial eligible node.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a7c567c8-ad53-4eb1-8fee-a7e869445531
📒 Files selected for processing (11)
tests/rhwa/internal/rhwaconfig/default.yamltests/rhwa/internal/rhwaconfig/rhwaconfig.gotests/rhwa/nhc-operator/README.mdtests/rhwa/nhc-operator/internal/nhcparams/const.gotests/rhwa/nhc-operator/internal/nhcparams/nhcvars.gotests/rhwa/nhc-operator/nhc_suite_test.gotests/rhwa/nhc-operator/tests/nhc.gotests/rhwa/nhc-operator/tests/sudden-node-loss.gotests/rhwa/snr-operator/internal/snrparams/snrvars.gotests/rhwa/snr-operator/snr_suite_test.gotests/rhwa/snr-operator/tests/snr.go
✅ Files skipped from review due to trivial changes (2)
- tests/rhwa/nhc-operator/internal/nhcparams/const.go
- tests/rhwa/nhc-operator/tests/nhc.go
🚧 Files skipped from review as they are similar to previous changes (4)
- tests/rhwa/internal/rhwaconfig/default.yaml
- tests/rhwa/nhc-operator/nhc_suite_test.go
- tests/rhwa/snr-operator/snr_suite_test.go
- tests/rhwa/nhc-operator/internal/nhcparams/nhcvars.go
b77b02a to
f1793f3
Compare
| // OutOfServiceTaintKey is the taint key applied by SNR to fence a node. | ||
| OutOfServiceTaintKey = "node.kubernetes.io/out-of-service" |
There was a problem hiding this comment.
Please remove unused constants
| // SNRDeploymentName is the name of the SNR operator deployment. | ||
| SNRDeploymentName = "self-node-remediation-controller-manager" |
There was a problem hiding this comment.
Please remove unused constants
| Type: appsv1.RecreateDeploymentStrategyType, | ||
| } | ||
|
|
||
| _, err = deploy.CreateAndWaitUntilReady(5 * time.Minute) |
There was a problem hiding this comment.
Please make this a constant, avoid hardcoded values inside test code
| By("Deleting test namespace") | ||
|
|
||
| testNS := namespace.NewBuilder(APIClient, nhcparams.AppNamespace) | ||
| err := testNS.DeleteAndWait(5 * time.Minute) |
There was a problem hiding this comment.
please extract this as a constant
| removeLabelPatch := []byte( | ||
| fmt.Sprintf(`{"metadata":{"labels":{%q:null}}}`, nhcparams.AppWorkerLabel)) | ||
| addLabelPatch := []byte( | ||
| fmt.Sprintf(`{"metadata":{"labels":{%q:""}}}`, nhcparams.AppWorkerLabel)) | ||
|
|
||
| for _, workerName := range RHWAConfig.FailoverWorkers { | ||
| workerNode, pullErr := nodes.Pull(APIClient, workerName) | ||
| if pullErr != nil { | ||
| continue | ||
| } | ||
|
|
||
| if _, exists := workerNode.Object.Labels[nhcparams.AppWorkerLabel]; exists { | ||
| klog.Infof("Temporarily removing %s label from failover node %s before deployment", | ||
| nhcparams.AppWorkerLabel, workerName) | ||
|
|
||
| _, patchErr := APIClient.K8sClient.CoreV1().Nodes().Patch( | ||
| context.TODO(), workerName, types.MergePatchType, | ||
| removeLabelPatch, metav1.PatchOptions{}) | ||
| Expect(patchErr).ToNot(HaveOccurred(), | ||
| fmt.Sprintf("Failed to remove label from node %s", workerName)) | ||
|
|
||
| unlabeledWorkers = append(unlabeledWorkers, workerName) | ||
| } | ||
| } |
There was a problem hiding this comment.
You can simplify this code leveraging a functionality already provided in the eco-goinfra node package. For example:
removeLabelPatch := []byte(
workerNode, err := nodes.Pull(APIClient, workerName)
Expect(err).ToNot(HaveOccurred())
// To add label:
workerNode.WithNewLabel(nhcparams.AppWorkerLabel, "")
_, err = workerNode.Update()
// To remove label:
workerNode.RemoveLabel(nhcparams.AppWorkerLabel, "")
_, err = workerNode.Update()
| targetNode = RHWAConfig.TargetWorker | ||
|
|
||
| By("Waiting for target worker node to be Ready") | ||
|
|
||
| Eventually(func() bool { | ||
| targetNodeObj, pullErr := nodes.Pull(APIClient, targetNode) | ||
| if pullErr != nil { | ||
| klog.Infof(" waiting for node %s: %v", targetNode, pullErr) | ||
|
|
||
| return false | ||
| } | ||
|
|
||
| for _, condition := range targetNodeObj.Object.Status.Conditions { | ||
| if condition.Type == corev1.NodeReady { | ||
| return condition.Status == corev1.ConditionTrue | ||
| } | ||
| } | ||
|
|
||
| return false | ||
| }).WithTimeout(nhcparams.NodeRecoveryTimeout). | ||
| WithPolling(nhcparams.PollingInterval). | ||
| Should(BeTrue(), | ||
| fmt.Sprintf("Target worker node %s did not become Ready", targetNode)) |
There was a problem hiding this comment.
You can reuse functionalities already available in the eco-goinfra nodes package. Example
// Wait for node to become Ready:
targetNodeObj, _ := nodes.Pull(APIClient, targetNode)
err := targetNodeObj.WaitUntilReady(nhcparams.NodeRecoveryTimeout)
Expect(err).ToNot(HaveOccurred())
// Wait for node to become NotReady (Unknown or False):
err := targetNodeObj.WaitUntilNotReady(nhcparams.NodeReadyTimeout)
Expect(err).ToNot(HaveOccurred())
// Wait for specific condition to be Unknown:
err := targetNodeObj.WaitUntilConditionUnknown(corev1.NodeReady, nhcparams.NodeReadyTimeout)
Expect(err).ToNot(HaveOccurred())
| func isPodReady(podBuilder *pod.Builder) bool { | ||
| for _, condition := range podBuilder.Object.Status.Conditions { | ||
| if condition.Type == corev1.PodReady { | ||
| return condition.Status == corev1.ConditionTrue | ||
| } | ||
| } | ||
|
|
||
| return false | ||
| } |
There was a problem hiding this comment.
you can use the podBuilder.isHealthy() method.
| func filterRunningPods(pods []*pod.Builder) []*pod.Builder { | ||
| var running []*pod.Builder | ||
|
|
||
| for _, p := range pods { | ||
| if p.Object.Status.Phase == corev1.PodRunning { | ||
| running = append(running, p) | ||
| } | ||
| } | ||
|
|
||
| return running | ||
| } |
There was a problem hiding this comment.
You have alternative methods to achieve this with functionalities already available in the eco-goinfra pod package.
Option 1: Use field selector in the initial List call
// Already partially done with FieldSelector, but could eliminate the filter function
appPods, err := pod.List(APIClient, nhcparams.AppNamespace, metav1.ListOptions{
LabelSelector: fmt.Sprintf("%s=%s", nhcparams.AppLabelKey, nhcparams.AppLabelValue),
FieldSelector: "status.phase=Running",
`)
Option 2: Use eco-goinfra's waiting functions
// For waiting until pods are running allRunning, err := pod.WaitForAllPodsInNamespaceRunning( APIClient, nhcparams.AppNamespace, timeout, metav1.ListOptions{LabelSelector: "app=stateful-app"}, )
| var ( | ||
| nhcGVR = schema.GroupVersionResource{ | ||
| Group: "remediation.medik8s.io", | ||
| Version: "v1alpha1", | ||
| Resource: "nodehealthchecks", | ||
| } | ||
|
|
||
| snrGVR = schema.GroupVersionResource{ | ||
| Group: "self-node-remediation.medik8s.io", | ||
| Version: "v1alpha1", | ||
| Resource: "selfnoderemediations", | ||
| } | ||
| ) |
There was a problem hiding this comment.
could this variables be extracted to the vars file?
| for _, workerName := range RHWAConfig.FailoverWorkers { | ||
| workerNode, pullErr := nodes.Pull(APIClient, workerName) | ||
| if pullErr != nil { | ||
| continue |
There was a problem hiding this comment.
this should be logged and might worth retrying to make it more reliable
Demostenes777
left a comment
There was a problem hiding this comment.
Good job Rigel! I have a few refactorization suggestions, mostly avoiding hardcoded values, keeping variable definitions in the *vars.go files and reusing as much as possible funtionalities already provided in the eco-goinfra packages.
Moreover, if you detect missing functionalities regarding nodes, pods, etc... I strongly suggest making a contribution to the eco-goinfra project instead of here.
| WithRedfishTimeout(nhcparams.BMCTimeout) | ||
| }) | ||
|
|
||
| AfterAll(func() { |
There was a problem hiding this comment.
why not to use DeferCleanup ?
4256b0d to
3cc4874
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/rhwa/nhc-operator/README.md (1)
23-23: Minor: Hyphenate "bare-metal" when used as a compound adjective.Static analysis flagged this: "bare metal" modifying "nodes" should be hyphenated as "bare-metal nodes". Same applies to lines 134 and 144.
📝 Proposed fix
-It has been tested on bare metal nodes. For virtualised infrastructure, a virtual BMC must be used, +It has been tested on bare-metal nodes. For virtualised infrastructure, a virtual BMC must be used,Line 134:
-broken down as follows (observed on a 4-worker bare metal cluster with `unhealthyConditions.duration=60s` +broken down as follows (observed on a 4-worker bare-metal cluster with `unhealthyConditions.duration=60s`Line 144:
-| AfterAll: Power on node & wait for Ready | ~5 min | Bare metal boot + kubelet registration | +| AfterAll: Power on node & wait for Ready | ~5 min | Bare-metal boot + kubelet registration |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/rhwa/nhc-operator/README.md` at line 23, Replace the unhyphenated phrase "bare metal" used as a compound adjective with "bare-metal" in the README; specifically locate the occurrences that modify "nodes" (the instance at the context around line 23 plus the other occurrences referenced) and update them to "bare-metal nodes" to ensure consistent compound-adjective hyphenation throughout the document.tests/rhwa/nhc-operator/tests/sudden-node-loss.go (1)
786-796: Remove unusedfilterRunningPodsfunction.This function is never called in the codebase and should be removed to keep the code clean.
♻️ Proposed fix
-func filterRunningPods(pods []*pod.Builder) []*pod.Builder { - var running []*pod.Builder - - for _, p := range pods { - if p.Object.Status.Phase == corev1.PodRunning { - running = append(running, p) - } - } - - return running -} -🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/rhwa/nhc-operator/tests/sudden-node-loss.go` around lines 786 - 796, Remove the unused helper function filterRunningPods (which takes []*pod.Builder and references corev1.PodRunning) from the file; locate the function definition and delete it along with any unused imports that were only required by it (e.g., corev1 or pod.Builder if now unused) to keep the test file clean.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/rhwa/nhc-operator/tests/nhc.go`:
- Around line 34-45: The test currently discards the boolean return from
pod.WaitForAllPodsInNamespaceRunning which can be true with no pods matched;
update the call in the It block to capture the boolean (e.g., ok, err :=
pod.WaitForAllPodsInNamespaceRunning(APIClient, rhwaparams.RhwaOperatorNs,
rhwaparams.DefaultTimeout, listOptions)) and assert both that err is nil and ok
is true (e.g., Expect(err).ToNot(HaveOccurred()) and Expect(ok).To(BeTrue(), "no
controller pod matched selector for OperatorControllerPodLabel")), referencing
the OperatorControllerPodLabel, APIClient, rhwaparams.RhwaOperatorNs and
pod.WaitForAllPodsInNamespaceRunning symbols.
---
Nitpick comments:
In `@tests/rhwa/nhc-operator/README.md`:
- Line 23: Replace the unhyphenated phrase "bare metal" used as a compound
adjective with "bare-metal" in the README; specifically locate the occurrences
that modify "nodes" (the instance at the context around line 23 plus the other
occurrences referenced) and update them to "bare-metal nodes" to ensure
consistent compound-adjective hyphenation throughout the document.
In `@tests/rhwa/nhc-operator/tests/sudden-node-loss.go`:
- Around line 786-796: Remove the unused helper function filterRunningPods
(which takes []*pod.Builder and references corev1.PodRunning) from the file;
locate the function definition and delete it along with any unused imports that
were only required by it (e.g., corev1 or pod.Builder if now unused) to keep the
test file clean.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 936ce2e3-f861-4d01-b074-034b3f638ff4
📒 Files selected for processing (12)
tests/rhwa/internal/rhwaconfig/default.yamltests/rhwa/internal/rhwaconfig/rhwaconfig.gotests/rhwa/internal/rhwaparams/rhwavars.gotests/rhwa/nhc-operator/README.mdtests/rhwa/nhc-operator/internal/nhcparams/const.gotests/rhwa/nhc-operator/internal/nhcparams/nhcvars.gotests/rhwa/nhc-operator/nhc_suite_test.gotests/rhwa/nhc-operator/tests/nhc.gotests/rhwa/nhc-operator/tests/sudden-node-loss.gotests/rhwa/snr-operator/internal/snrparams/snrvars.gotests/rhwa/snr-operator/snr_suite_test.gotests/rhwa/snr-operator/tests/snr.go
✅ Files skipped from review due to trivial changes (4)
- tests/rhwa/snr-operator/internal/snrparams/snrvars.go
- tests/rhwa/internal/rhwaconfig/default.yaml
- tests/rhwa/nhc-operator/nhc_suite_test.go
- tests/rhwa/nhc-operator/internal/nhcparams/const.go
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/rhwa/snr-operator/snr_suite_test.go
- tests/rhwa/nhc-operator/internal/nhcparams/nhcvars.go
ee44ec2 to
89b4045
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
55e2dd1 to
ab9350a
Compare
Add a test suite that runs through the following scenario. A healthy OpenShift cluster with a MNO topology experiences the sudden shutdown of a worker node that is running a stateful application (a simple Pod writing to a PV). The application is evicted and rescheduled to another healthy node in order to provide business continuity. Assisted-by: Claude
ab9350a to
290575b
Compare
Add a test suite that runs through the following scenario.
A healthy OpenShift cluster with a MNO topology experiences the sudden shutdown of a worker node that is running a stateful application (a simple Pod writing to a PV). The application is evicted and rescheduled to another healthy node in order to provide business continuity.
Assisted-by: Claude
Summary by CodeRabbit
Tests
Chores
Documentation