Skip to content

Verify cluster in API server before deleting namespace#4736

Merged
thardeck merged 1 commit into
mainfrom
implement_3830
Mar 17, 2026
Merged

Verify cluster in API server before deleting namespace#4736
thardeck merged 1 commit into
mainfrom
implement_3830

Conversation

@thardeck
Copy link
Copy Markdown
Collaborator

@thardeck thardeck commented Mar 3, 2026

The cleanup controller used only the informer cache to decide whether a fleet-managed namespace should be removed. During startup or under load, the cache may not yet contain a newly created Cluster, causing the namespace to be wrongly deleted and breaking bundle deployment.

Confirm a cache not-found result with a live API call before deleting the namespace.

Refers to #3830

@thardeck thardeck added this to the v2.15.0 milestone Mar 3, 2026
@thardeck thardeck self-assigned this Mar 3, 2026
@thardeck thardeck requested a review from a team as a code owner March 3, 2026 12:44
Copilot AI review requested due to automatic review settings March 3, 2026 12:44
@thardeck thardeck added this to Fleet Mar 3, 2026
@thardeck thardeck moved this to 👀 In review in Fleet Mar 3, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the cleanup controller’s decision to delete fleet-managed cluster namespaces by confirming cache “not found” results against the live API server, preventing erroneous namespace deletion during cache warm-up or under load (ref #3830).

Changes:

  • Pass a ClusterClient into the cleanup controller so it can query the live API when the cache reports a missing Cluster.
  • Update cleanupNamespace to only delete the namespace after both cache and live API calls return NotFound for the Cluster.
  • Add a focused unit test suite covering cache-hit, cache-miss/API-hit, and true-not-found scenarios.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
internal/cmd/controller/cleanup/controllers/controllers.go Wires the Cluster controller into cleanup registration so a live Cluster GET can be performed.
internal/cmd/controller/cleanup/controllers/cleanup/controller.go Implements the cache-first, API-confirmed Cluster existence check before namespace deletion.
internal/cmd/controller/cleanup/controllers/cleanup/controller_test.go Adds tests validating the new cache-miss verification behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/cmd/controller/cleanup/controllers/cleanup/controller_test.go Outdated
The cleanup controller used only the informer cache to decide whether
a fleet-managed namespace should be removed. During startup or under
load, the cache may not yet contain a newly created Cluster, causing
the namespace to be wrongly deleted and breaking bundle deployment.

Confirm a cache not-found result with a live API call before deleting
the namespace.
@thardeck thardeck modified the milestones: v2.15.0, v2.14.1 Mar 6, 2026
@thardeck thardeck merged commit 6ccd78b into main Mar 17, 2026
22 checks passed
@thardeck thardeck deleted the implement_3830 branch March 17, 2026 17:39
@github-project-automation github-project-automation Bot moved this from 👀 In review to ✅ Done in Fleet Mar 17, 2026
@thardeck thardeck modified the milestones: v2.14.1, v2.15.0 Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants