fix: handle nil Capabilities/Tracing in ObservabilityInstaller controller#1124
fix: handle nil Capabilities/Tracing in ObservabilityInstaller controller#1124IshwarKanse wants to merge 1 commit into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: IshwarKanse The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @IshwarKanse. Thanks for your PR. I'm waiting for a rhobs member to verify that this patch is reasonable to test. If it is, they should reply with Tip We noticed you've done this a few times! Consider joining the org to skip this step and gain Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Warning Review limit reached
More reviews will be available in 48 minutes and 30 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR fixes a nil-safety bug in the observability controller's status update method and provides comprehensive test coverage for the edge case where an ObservabilityInstaller has an empty spec. The controller now guards access to Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/controllers/observability/observability_controller.go (1)
207-233:⚠️ Potential issue | 🟠 Major | ⚡ Quick winClear status when tracing is nil or disabled.
Line 209 prevents the panic, but when
Capabilitiesexists andTracingisnil/disabled,Status.TempoandStatus.OpenTelemetryare left untouched. This can retain stale “enabled” status after config changes.Suggested fix
func (o observabilityInstallerController) updateStatus(ctx context.Context, instance *obsv1alpha1.ObservabilityInstaller, reconcileErr error) reconcile.Result { if instance.Spec.Capabilities != nil { capabilities := instance.Spec.Capabilities if capabilities.Tracing != nil && capabilities.Tracing.Enabled { otelcol := &otelv1beta1.OpenTelemetryCollector{} @@ instance.Status.Tempo = fmt.Sprintf("%s/%s (%s)", instance.Namespace, tempoName(instance.Name), tempo.Status.TempoVersion) instance.Status.OpenTelemetry = fmt.Sprintf("%s/%s (%s)", instance.Namespace, otelCollectorName(instance.Name), otelcol.Status.Version) + } else { + instance.Status.Tempo = "" + instance.Status.OpenTelemetry = "" } } else { instance.Status.Tempo = "" instance.Status.OpenTelemetry = "" }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controllers/observability/observability_controller.go` around lines 207 - 233, When instance.Spec.Capabilities is non-nil but Capabilities.Tracing is nil or Capabilities.Tracing.Enabled is false, clear the stale status fields by setting instance.Status.Tempo = "" and instance.Status.OpenTelemetry = ""; update the conditional in the reconciler (the block that inspects instance.Spec.Capabilities / capabilities.Tracing in observability_controller.go) so that after checking "if capabilities.Tracing != nil && capabilities.Tracing.Enabled { ... }" you add an else branch that explicitly clears those two status fields, leaving the existing Get/requeue logic unchanged inside the enabled branch.
🧹 Nitpick comments (1)
test/e2e/observability_installer_test.go (1)
254-267: ⚡ Quick winReplace fixed sleep with bounded polling.
Line 256 adds a hard 30s delay to every run and is still timing-sensitive across clusters. Prefer polling the operator availability condition with timeout instead of sleeping.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/observability_installer_test.go` around lines 254 - 267, Replace the fixed time.Sleep(30 * time.Second) with bounded polling that repeatedly GETs the observability-operator deployment until its Status.AvailableReplicas > 0 or a timeout elapses; specifically, remove the sleep and implement a loop (e.g. wait.PollImmediate or a simple for loop with time.After) that calls f.K8sClient.Get(ctx, types.NamespacedName{Name: "observability-operator", Namespace: f.OperatorNamespace}, &operatorDeploy) and breaks when operatorDeploy.Status.AvailableReplicas > 0, asserting via require.NoError on Get errors and require.Greater or require.Fail if the timeout is reached — this keeps the existing checks around operatorDeploy, f.K8sClient.Get, types.NamespacedName, operatorDeploy.Status.AvailableReplicas, require.NoError and require.Greater.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/observability_installer_test.go`:
- Around line 235-240: The test currently creates a Namespace with a hard-coded
name ("obs-empty-spec-test") stored in variable ns and calls f.K8sClient.Create,
which risks AlreadyExists on parallel/re-run; change ns.ObjectMeta.Name to a
generated unique name (for example use the test framework's helper like
f.UniqueName()/f.NewNamespaceName() if available, or compose one from t.Name()
and a timestamp/nonce via fmt.Sprintf("%s-%d", t.Name(), time.Now().UnixNano()))
so each run gets a unique namespace, leaving the rest of the code
(f.K8sClient.Create and f.CleanUp) unchanged and still deleting that generated
ns afterward.
- Around line 296-300: The delete-polling function using
wait.PollUntilContextTimeout currently swallows all non-NotFound errors by
returning (false, nil); update the closure used by wait.PollUntilContextTimeout
(the anonymous func calling f.K8sClient.Get for
obsv1alpha1.ObservabilityInstaller using obsInstaller and ns) to return (true,
nil) when apierrors.IsNotFound(err) is true, but return (false, err) for any
other error so real API failures bubble up and stop the poll instead of being
masked as a timeout.
---
Outside diff comments:
In `@pkg/controllers/observability/observability_controller.go`:
- Around line 207-233: When instance.Spec.Capabilities is non-nil but
Capabilities.Tracing is nil or Capabilities.Tracing.Enabled is false, clear the
stale status fields by setting instance.Status.Tempo = "" and
instance.Status.OpenTelemetry = ""; update the conditional in the reconciler
(the block that inspects instance.Spec.Capabilities / capabilities.Tracing in
observability_controller.go) so that after checking "if capabilities.Tracing !=
nil && capabilities.Tracing.Enabled { ... }" you add an else branch that
explicitly clears those two status fields, leaving the existing Get/requeue
logic unchanged inside the enabled branch.
---
Nitpick comments:
In `@test/e2e/observability_installer_test.go`:
- Around line 254-267: Replace the fixed time.Sleep(30 * time.Second) with
bounded polling that repeatedly GETs the observability-operator deployment until
its Status.AvailableReplicas > 0 or a timeout elapses; specifically, remove the
sleep and implement a loop (e.g. wait.PollImmediate or a simple for loop with
time.After) that calls f.K8sClient.Get(ctx, types.NamespacedName{Name:
"observability-operator", Namespace: f.OperatorNamespace}, &operatorDeploy) and
breaks when operatorDeploy.Status.AvailableReplicas > 0, asserting via
require.NoError on Get errors and require.Greater or require.Fail if the timeout
is reached — this keeps the existing checks around operatorDeploy,
f.K8sClient.Get, types.NamespacedName, operatorDeploy.Status.AvailableReplicas,
require.NoError and require.Greater.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 06d5dbb4-709a-4fd3-83f8-588b4e94f26a
📒 Files selected for processing (3)
pkg/controllers/observability/observability_controller.gopkg/controllers/observability/tempo_components_test.gotest/e2e/observability_installer_test.go
a56198b to
baf3d41
Compare
…mpty spec Assisted by Claude Code
baf3d41 to
45e1f93
Compare
Summary
Resolves bug https://redhat.atlassian.net/browse/COO-1257
updateStatuswhenCapabilitiesis non-nil butTracingis nil (e.g.spec: { capabilities: {} })tempoStack()covering all nil-input combinations (nil Capabilities, nil Tracing, nil Storage, nil ObjectStorageSpec) and all storage backends (S3, S3STS, Azure, GCS) including TLS variantsObservabilityInstallerEmptySpecthat creates anObservabilityInstallerwithspec: {}on a live cluster, verifies the controller does not crash-loop, no operands are deployed, status fields remain empty, and the resource deletes cleanlyRoot Cause
observability_controller.go updateStatusaccessedcapabilities.Tracing.Enabled(a field dereference) without first checkingcapabilities.Tracing != nil. When anObservabilityInstalleris created withspec: { capabilities: {} },Capabilitiesis non-nil butTracingis nil, causing a nil pointer panic. The fix adds the missing nil guard:Note: the related panic in
tempoStack()(whenspec: {}leavesCapabilitiesnil) was already fixed via safe getter methods (GetCapabilities(),GetTracing(), etc.) on the API types.Test plan
go test ./pkg/controllers/observability/...— all 17 tests pass including newTestTempoStackwith nil/empty input casesTestObservabilityInstallerController/ObservabilityInstallerEmptySpecpasses on a live OCP cluster — controller stays available, no operands deployed, clean deletion.