Skip to content

fix: handle nil Capabilities/Tracing in ObservabilityInstaller controller#1124

Open
IshwarKanse wants to merge 1 commit into
rhobs:mainfrom
IshwarKanse:fix/nil-pointer-empty-spec
Open

fix: handle nil Capabilities/Tracing in ObservabilityInstaller controller#1124
IshwarKanse wants to merge 1 commit into
rhobs:mainfrom
IshwarKanse:fix/nil-pointer-empty-spec

Conversation

@IshwarKanse

@IshwarKanse IshwarKanse commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

Resolves bug https://redhat.atlassian.net/browse/COO-1257

  • Fix nil pointer dereference panic in updateStatus when Capabilities is non-nil but Tracing is nil (e.g. spec: { capabilities: {} })
  • Add 11 unit tests for tempoStack() covering all nil-input combinations (nil Capabilities, nil Tracing, nil Storage, nil ObjectStorageSpec) and all storage backends (S3, S3STS, Azure, GCS) including TLS variants
  • Add e2e test ObservabilityInstallerEmptySpec that creates an ObservabilityInstaller with spec: {} on a live cluster, verifies the controller does not crash-loop, no operands are deployed, status fields remain empty, and the resource deletes cleanly

Root Cause

observability_controller.go updateStatus accessed capabilities.Tracing.Enabled (a field dereference) without first checking capabilities.Tracing != nil. When an ObservabilityInstaller is created with spec: { capabilities: {} }, Capabilities is non-nil but Tracing is nil, causing a nil pointer panic. The fix adds the missing nil guard:

// Before
if capabilities.Tracing.Enabled {

// After
if capabilities.Tracing != nil && capabilities.Tracing.Enabled {

Note: the related panic in tempoStack() (when spec: {} leaves Capabilities nil) was already fixed via safe getter methods (GetCapabilities(), GetTracing(), etc.) on the API types.

Test plan

  • go test ./pkg/controllers/observability/... — all 17 tests pass including new TestTempoStack with nil/empty input cases
  • E2e test TestObservabilityInstallerController/ObservabilityInstallerEmptySpec passes on a live OCP cluster — controller stays available, no operands deployed, clean deletion.

@openshift-ci openshift-ci Bot requested review from danielmellado and jan--f June 8, 2026 06:40
@openshift-ci

openshift-ci Bot commented Jun 8, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: IshwarKanse
Once this PR has been reviewed and has the lgtm label, please assign peteryurkovich for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci

openshift-ci Bot commented Jun 8, 2026

Copy link
Copy Markdown

Hi @IshwarKanse. Thanks for your PR.

I'm waiting for a rhobs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@IshwarKanse, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 48 minutes and 30 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 8d724a20-76fc-4efb-a9f1-212385aabe99

📥 Commits

Reviewing files that changed from the base of the PR and between b865d5f and 45e1f93.

📒 Files selected for processing (3)
  • pkg/controllers/observability/observability_controller.go
  • pkg/controllers/observability/tempo_components_test.go
  • test/e2e/observability_installer_test.go
📝 Walkthrough

Walkthrough

This PR fixes a nil-safety bug in the observability controller's status update method and provides comprehensive test coverage for the edge case where an ObservabilityInstaller has an empty spec. The controller now guards access to capabilities.Tracing.Enabled by checking if capabilities.Tracing is non-nil first. New unit tests validate the tempoStack function across nil capabilities, multiple storage backends (S3, S3STS, Azure, GCS), and TLS configurations. An e2e test verifies that an empty spec does not crash the operator or deploy unwanted operands, and that status fields remain empty as expected.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main bug fix: adding nil guards for Capabilities/Tracing to prevent panics when these fields are empty.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description clearly relates to the changeset, explaining the bug fix, root cause, tests added, and test results.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@IshwarKanse

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/controllers/observability/observability_controller.go (1)

207-233: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clear status when tracing is nil or disabled.

Line 209 prevents the panic, but when Capabilities exists and Tracing is nil/disabled, Status.Tempo and Status.OpenTelemetry are left untouched. This can retain stale “enabled” status after config changes.

Suggested fix
 func (o observabilityInstallerController) updateStatus(ctx context.Context, instance *obsv1alpha1.ObservabilityInstaller, reconcileErr error) reconcile.Result {
 	if instance.Spec.Capabilities != nil {
 		capabilities := instance.Spec.Capabilities
 		if capabilities.Tracing != nil && capabilities.Tracing.Enabled {
 			otelcol := &otelv1beta1.OpenTelemetryCollector{}
@@
 			instance.Status.Tempo = fmt.Sprintf("%s/%s (%s)", instance.Namespace, tempoName(instance.Name), tempo.Status.TempoVersion)
 			instance.Status.OpenTelemetry = fmt.Sprintf("%s/%s (%s)", instance.Namespace, otelCollectorName(instance.Name), otelcol.Status.Version)
+		} else {
+			instance.Status.Tempo = ""
+			instance.Status.OpenTelemetry = ""
 		}
 	} else {
 		instance.Status.Tempo = ""
 		instance.Status.OpenTelemetry = ""
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controllers/observability/observability_controller.go` around lines 207 -
233, When instance.Spec.Capabilities is non-nil but Capabilities.Tracing is nil
or Capabilities.Tracing.Enabled is false, clear the stale status fields by
setting instance.Status.Tempo = "" and instance.Status.OpenTelemetry = "";
update the conditional in the reconciler (the block that inspects
instance.Spec.Capabilities / capabilities.Tracing in
observability_controller.go) so that after checking "if capabilities.Tracing !=
nil && capabilities.Tracing.Enabled { ... }" you add an else branch that
explicitly clears those two status fields, leaving the existing Get/requeue
logic unchanged inside the enabled branch.
🧹 Nitpick comments (1)
test/e2e/observability_installer_test.go (1)

254-267: ⚡ Quick win

Replace fixed sleep with bounded polling.

Line 256 adds a hard 30s delay to every run and is still timing-sensitive across clusters. Prefer polling the operator availability condition with timeout instead of sleeping.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/observability_installer_test.go` around lines 254 - 267, Replace the
fixed time.Sleep(30 * time.Second) with bounded polling that repeatedly GETs the
observability-operator deployment until its Status.AvailableReplicas > 0 or a
timeout elapses; specifically, remove the sleep and implement a loop (e.g.
wait.PollImmediate or a simple for loop with time.After) that calls
f.K8sClient.Get(ctx, types.NamespacedName{Name: "observability-operator",
Namespace: f.OperatorNamespace}, &operatorDeploy) and breaks when
operatorDeploy.Status.AvailableReplicas > 0, asserting via require.NoError on
Get errors and require.Greater or require.Fail if the timeout is reached — this
keeps the existing checks around operatorDeploy, f.K8sClient.Get,
types.NamespacedName, operatorDeploy.Status.AvailableReplicas, require.NoError
and require.Greater.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/observability_installer_test.go`:
- Around line 235-240: The test currently creates a Namespace with a hard-coded
name ("obs-empty-spec-test") stored in variable ns and calls f.K8sClient.Create,
which risks AlreadyExists on parallel/re-run; change ns.ObjectMeta.Name to a
generated unique name (for example use the test framework's helper like
f.UniqueName()/f.NewNamespaceName() if available, or compose one from t.Name()
and a timestamp/nonce via fmt.Sprintf("%s-%d", t.Name(), time.Now().UnixNano()))
so each run gets a unique namespace, leaving the rest of the code
(f.K8sClient.Create and f.CleanUp) unchanged and still deleting that generated
ns afterward.
- Around line 296-300: The delete-polling function using
wait.PollUntilContextTimeout currently swallows all non-NotFound errors by
returning (false, nil); update the closure used by wait.PollUntilContextTimeout
(the anonymous func calling f.K8sClient.Get for
obsv1alpha1.ObservabilityInstaller using obsInstaller and ns) to return (true,
nil) when apierrors.IsNotFound(err) is true, but return (false, err) for any
other error so real API failures bubble up and stop the poll instead of being
masked as a timeout.

---

Outside diff comments:
In `@pkg/controllers/observability/observability_controller.go`:
- Around line 207-233: When instance.Spec.Capabilities is non-nil but
Capabilities.Tracing is nil or Capabilities.Tracing.Enabled is false, clear the
stale status fields by setting instance.Status.Tempo = "" and
instance.Status.OpenTelemetry = ""; update the conditional in the reconciler
(the block that inspects instance.Spec.Capabilities / capabilities.Tracing in
observability_controller.go) so that after checking "if capabilities.Tracing !=
nil && capabilities.Tracing.Enabled { ... }" you add an else branch that
explicitly clears those two status fields, leaving the existing Get/requeue
logic unchanged inside the enabled branch.

---

Nitpick comments:
In `@test/e2e/observability_installer_test.go`:
- Around line 254-267: Replace the fixed time.Sleep(30 * time.Second) with
bounded polling that repeatedly GETs the observability-operator deployment until
its Status.AvailableReplicas > 0 or a timeout elapses; specifically, remove the
sleep and implement a loop (e.g. wait.PollImmediate or a simple for loop with
time.After) that calls f.K8sClient.Get(ctx, types.NamespacedName{Name:
"observability-operator", Namespace: f.OperatorNamespace}, &operatorDeploy) and
breaks when operatorDeploy.Status.AvailableReplicas > 0, asserting via
require.NoError on Get errors and require.Greater or require.Fail if the timeout
is reached — this keeps the existing checks around operatorDeploy,
f.K8sClient.Get, types.NamespacedName, operatorDeploy.Status.AvailableReplicas,
require.NoError and require.Greater.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 06d5dbb4-709a-4fd3-83f8-588b4e94f26a

📥 Commits

Reviewing files that changed from the base of the PR and between eb33497 and b865d5f.

📒 Files selected for processing (3)
  • pkg/controllers/observability/observability_controller.go
  • pkg/controllers/observability/tempo_components_test.go
  • test/e2e/observability_installer_test.go

Comment thread test/e2e/observability_installer_test.go
Comment thread test/e2e/observability_installer_test.go
@IshwarKanse IshwarKanse force-pushed the fix/nil-pointer-empty-spec branch from a56198b to baf3d41 Compare June 8, 2026 06:48
@IshwarKanse IshwarKanse force-pushed the fix/nil-pointer-empty-spec branch from baf3d41 to 45e1f93 Compare June 8, 2026 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant