Skip to content

Surface CollectorDaemonSetExists status condition on OperatorConfig#1854

Open
AnkanMisra wants to merge 6 commits intoGoogleCloudPlatform:mainfrom
AnkanMisra:main
Open

Surface CollectorDaemonSetExists status condition on OperatorConfig#1854
AnkanMisra wants to merge 6 commits intoGoogleCloudPlatform:mainfrom
AnkanMisra:main

Conversation

@AnkanMisra
Copy link
Copy Markdown

Closes #1830

Summary

Adds a CollectorDaemonSetExists status condition to OperatorConfig to surface when the collector DaemonSet is missing.

Problem

When the collector DaemonSet (gmp-system/collector) is deleted, the operator logs a warning but returns success without surfacing any status condition. This leaves metrics collection silently broken with no way for users to detect the outage via the API.

Solution

  • Add CollectorDaemonSetExists condition to OperatorConfig.Status
  • Set True when the DaemonSet exists, False with reason DaemonSetMissing when not found
  • Initialize to Unknown to handle unexpected API errors correctly

This only checks DaemonSet existence, not pod readiness, and does not auto recreate deleted DaemonSets

@google-cla
Copy link
Copy Markdown

google-cla bot commented Jan 25, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Re-added GenerationChangedPredicate to DaemonSet watcher to avoid noise from status updates (while still catching deletions). Added GenerationChangedPredicate to OperatorConfig watcher to prevent feedback loops from status updates.
Address CodeRabbit feedback: initialize condition status to Unknown
instead of True, so that on unexpected errors (network issues, RBAC),
the condition reflects uncertainty rather than falsely indicating the
DaemonSet exists.
@bernot-dev
Copy link
Copy Markdown
Collaborator

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a status subresource for the OperatorConfig custom resource, allowing the operator to report its operational state, including conditions like the existence of the collector DaemonSet. The changes involve updating CRD definitions, API documentation, Go types (OperatorConfigStatus, MonitoringConditionType), and the reconciler logic to manage and update the OperatorConfig's status. The review comments highlight an inaccuracy in the CRD and manifest definitions, where the description for the conditions array incorrectly refers to PodMonitoring instead of OperatorConfig.

Comment on lines +519 to +520
description: Represents the latest available observations of a podmonitor's
current state.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The description for the conditions array appears to be copied from PodMonitoring and is not entirely accurate for OperatorConfig. It currently states "Represents the latest available observations of a podmonitor's current state." This should be updated to reflect that these conditions apply to the OperatorConfig itself, or be more generalized.

                description: Represents the latest available observations of an OperatorConfig's
                  current state.

Comment on lines +2325 to +2326
description: Represents the latest available observations of a podmonitor's current state.
items:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the CRD definition, the description for the conditions array here is copied from PodMonitoring. It should be updated to accurately describe the conditions for OperatorConfig.

                description: Represents the latest available observations of an OperatorConfig's current state.

@bernot-dev
Copy link
Copy Markdown
Collaborator

@AnkanMisra Thank you for this PR. However, we cannot review and approve PRs without the CLA signed.

Gemini Code Assist also flagged a couple changes needed.

@AnkanMisra
Copy link
Copy Markdown
Author

@bernot-dev
Thanks for the review
I’ll sign the CLA right away
I’ll also address the changes flagged by Gemini Code Assist and push updates soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Operator does not self-heal collector DaemonSet deletion, causing silent collection outage

2 participants