HYPERFLEET-854 - feat: implement hard deletion for clusters and nodepools by mliptak0 · Pull Request #119 · openshift-hyperfleet/hyperfleet-api

mliptak0 · 2026-04-30T13:45:04Z

implement hard deletion for clusters and nodepools

Summary

HYPERFLEET-854

Test Plan

Unit tests added/updated
make test-all passes
make lint passes
Helm chart changes validated with make test-helm (if applicable)
Deployed to a development cluster and verified (if Helm/config changes)
E2E tests passed (if cross-component or major changes)

Summary by CodeRabbit

New Features
- Hard deletion of clusters and node pools after soft-delete once required adapters report Finalized and no child resources remain
- New intermediate status WaitingForChildResources prevents final reconciliation while child resources still exist
Changed
- Condition aggregation now accounts for child-resource presence during deletion and preserves transition timestamps reliably
- Adapter-status handling refined to improve consistency around updates and deletions
Documentation
- Changelog entries added describing these behaviors
Tests
- New integration and unit tests covering hard-delete and reconciled-state scenarios

…ools

coderabbitai · 2026-04-30T13:45:26Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

This PR implements hard-delete semantics for Clusters and NodePools: after a soft-deleted resource's required adapters report Finalized=True and no child resources remain, the resource and its adapter status rows are deleted. Adds AdapterStatus.IsFinalized(), refactors AdapterStatusDao.Upsert to accept a pre-fetched existing record and adds DeleteByResource, adds Cluster/NodePool DAO methods GetForUpdate and SaveStatusConditions, moves LastTransitionTime preservation into services, introduces a WaitingForChildResources reconciled state during deletion, and adds unit and integration tests for these behaviors.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ClusterService
    participant ClusterDAO
    participant AdapterStatusDAO
    participant NodePoolDAO
    participant DB

    rect rgba(100, 150, 200, 0.5)
        Note over Client,DB: Cluster adapter status processing + potential hard-delete
    end

    Client->>ClusterService: ProcessAdapterStatus(clusterID, adapterStatus)
    activate ClusterService

    ClusterService->>ClusterDAO: GetForUpdate(clusterID)
    ClusterDAO->>DB: SELECT ... FOR UPDATE
    DB-->>ClusterDAO: cluster row
    ClusterDAO-->>ClusterService: *Cluster

    ClusterService->>AdapterStatusDAO: FindByResource(clusterID)
    AdapterStatusDAO->>DB: SELECT adapter_statuses WHERE resource_id=?
    DB-->>AdapterStatusDAO: adapter statuses
    AdapterStatusDAO-->>ClusterService: []AdapterStatus

    ClusterService->>ClusterService: validateAndClassify(adapterStatus)
    ClusterService->>AdapterStatusDAO: Upsert(adapterStatus, existing)
    AdapterStatusDAO->>DB: INSERT/UPDATE adapter_statuses
    DB-->>AdapterStatusDAO: upsert result
    AdapterStatusDAO-->>ClusterService: *AdapterStatus

    alt all required adapters finalized AND NOT NodePoolDAO.ExistsByOwner(clusterID)
        ClusterService->>AdapterStatusDAO: DeleteByResource(cluster, clusterID)
        AdapterStatusDAO->>DB: DELETE FROM adapter_statuses WHERE resource_id=?
        DB-->>AdapterStatusDAO: success

        ClusterService->>ClusterDAO: Delete(clusterID)
        ClusterDAO->>DB: DELETE FROM clusters WHERE id=?
        DB-->>ClusterDAO: success

        ClusterService-->>Client: cluster hard-deleted
    else
        ClusterService->>ClusterService: recomputeAndSaveClusterStatus(cluster, adapterStatuses)
        ClusterService->>ClusterDAO: SaveStatusConditions(clusterID, conditions)
        ClusterDAO->>DB: UPDATE clusters SET status_conditions=?
        DB-->>ClusterDAO: success

        ClusterService-->>Client: cluster still exists (soft-deleted or waiting)
    end

    deactivate ClusterService

sequenceDiagram
    participant Aggregator
    participant Caller

    rect rgba(150, 100, 150, 0.5)
        Note over Caller,Aggregator: Reconciled computation during deletion
    end

    Caller->>Aggregator: computeReconciled(adapterStatuses, deletedTime!=nil, hasChildResources)
    activate Aggregator

    Aggregator->>Aggregator: allFinalized := allAdaptersFinalized(...)
    alt deletedTime != nil AND allFinalized AND hasChildResources == true
        Aggregator-->>Caller: Reconciled=False (Reason=WaitingForChildResources)
    else deletedTime != nil AND allFinalized AND hasChildResources == false
        Aggregator-->>Caller: Reconciled=True (Reason=AllAdaptersReconciled)
    else
        Aggregator-->>Caller: Reconciled=False (Reason=MissingRequiredAdapters)
    end

    deactivate Aggregator

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 61.54% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: implementing hard deletion for clusters and nodepools, which aligns with the substantial refactoring across DAO, service, and test layers described in the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Around line 12-13: The changelog contains invalid PR links using
"/pull/HYPERFLEET-854" which must be replaced with the numeric PR path
"/pull/119"; update both occurrences in CHANGELOG.md (the two lines referencing
HYPERFLEET-854 and the additional occurrence noted at lines 31-31) so each
markdown link points to
https://github.com/openshift-hyperfleet/hyperfleet-api/pull/119 instead of the
non-numeric form.

In `@pkg/dao/adapter_status.go`:
- Around line 98-100: When updateResult.RowsAffected == 0 in the adapter status
update flow, explicitly distinguish a stale-report (concurrent version mismatch)
from a row that was deleted between read and update: after detecting
RowsAffected == 0, re-query the DB for the row using the same key/ID from
existing (e.g., existing.ID) — if the re-read returns no row, return a not-found
error (or trigger recreate logic) instead of returning existing; if the re-read
returns a row whose version/fields differ from existing, treat that as a stale
update and return existing (or a conflict error) so callers know the update
didn’t persist. Ensure this logic sits where updateResult.RowsAffected is
checked (the function handling the update in adapter_status.go).

In `@pkg/dao/cluster.go`:
- Around line 97-105: In SaveStatusConditions
(sqlClusterDao.SaveStatusConditions) add a check after the Update to treat zero
affected rows as not-found: if result.RowsAffected == 0, call
db.MarkForRollback(ctx, gorm.ErrRecordNotFound) (or sql.ErrNoRows) and return an
appropriate not-found error instead of nil; keep the existing error handling for
result.Error unchanged so true DB errors still roll back.

In `@pkg/dao/node_pool.go`:
- Around line 60-68: In SaveStatusConditions, don't treat a zero-row update as
success: after calling g2.Model(&api.NodePool{}).Where(...).Update(...) inspect
result.RowsAffected and if it is 0 return a not-found error (and call
db.MarkForRollback(ctx, err) as appropriate) instead of returning nil; keep
existing behavior for result.Error non-nil. Use the SaveStatusConditions
function and result.RowsAffected to implement this check so concurrent
hard-delete/update races surface as not-found.

In `@pkg/services/adapter_status.go`:
- Around line 125-130: The lookup code around
s.adapterStatusDao.FindByResourceAndAdapter currently treats any error as a "not
found" by setting existing = nil, which can hide real DAO failures before
calling Upsert; update the logic in the function to only normalize to nil when
the DAO returns the explicit not-found sentinel (e.g., ErrNotFound or the dao's
not-found behavior) and to return the lookup error immediately for any other
error; locate the call to s.adapterStatusDao.FindByResourceAndAdapter(ctx,
adapterStatus.ResourceType, adapterStatus.ResourceID, adapterStatus.Adapter) and
replace the unconditional existing = nil on error with a conditional that checks
the error value/type and either sets existing = nil for not-found or returns the
error so Upsert sees only valid existing state.

In `@pkg/services/aggregation.go`:
- Around line 740-754: The helper allAdaptersFinalized currently only checks
adapterStatus.IsFinalized() and thus considers finalization from any generation;
change it to require Finalized==true at the current resource generation by
adding a generation parameter (e.g., currentGeneration int64) and only count
adapterStatus entries whose Generation equals that parameter and whose
IsFinalized() is true; update callers (cluster/nodepool hard-delete call sites)
to pass the current resource generation, mirroring the generation filtering
approach used by computeReconciled, so older finalization rows are ignored.

In `@pkg/services/node_pool.go`:
- Around line 272-279: After a successful hard delete in the node-pool deletion
path (when nodePool.DeletedTime != nil and tryHardDeleteNodePool returns true),
ensure we re-check/requeue the parent cluster using nodePool.OwnerID so the
cluster's hard-delete/status is recomputed now that the final child is gone;
update the same logic in the other occurrence (around lines 381-403) to also
trigger the parent cluster reconciliation instead of returning immediately (use
the existing cluster-reconciliation/enqueue helper on the service, e.g., the
method responsible for cluster status recomputation or enqueuing by ID).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: cdeffff4-37b5-409f-b5f2-8265a9d7ff7a

📥 Commits

Reviewing files that changed from the base of the PR and between 3622815 and a7969d3.

📒 Files selected for processing (17)

CHANGELOG.md
pkg/api/adapter_status_types.go
pkg/dao/adapter_status.go
pkg/dao/cluster.go
pkg/dao/mocks/cluster.go
pkg/dao/mocks/node_pool.go
pkg/dao/node_pool.go
pkg/services/adapter_status.go
pkg/services/aggregation.go
pkg/services/aggregation_test.go
pkg/services/cluster.go
pkg/services/cluster_test.go
pkg/services/node_pool.go
pkg/services/node_pool_test.go
pkg/services/status_helpers.go
test/integration/clusters_test.go
test/integration/node_pools_test.go

mliptak0 · 2026-05-04T07:12:42Z

/retest

…rs as finalized

…idental overwrites

…aw SQL Co-Authored-By: Claude <noreply@anthropic.com>

rh-amarin · 2026-05-04T09:58:58Z

+	allFinalizedButChildrenExist := deletedTime != nil && allAtCurrent && hasChildResources
+
 	status := api.ConditionFalse
-	if len(required) > 0 && allAtCurrent {
+	if len(required) > 0 && allAtCurrent && !allFinalizedButChildrenExist {
 		status = api.ConditionTrue
 	}


I find this a bit hard to understand because of the negative logic
If we recap, the condition for Reconciled True is:

All reports at current generation

When deleting, there should be no children

Would something like

if len(required) > 0 && allAtCurrent && ( deletedTime == nil || deletedTime!=nil && !hasChildResources ){ status = api.ConditionTrue }

…aptersConditionMet

rh-amarin · 2026-05-04T16:17:27Z

/lgtm

openshift-ci · 2026-05-04T16:17:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rh-amarin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [rh-amarin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

HYPERFLEET-854 - feat: implement hard deletion for clusters and nodep…

a7969d3

…ools

openshift-ci Bot requested review from aredenba-rh and vkareh April 30, 2026 13:45

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread CHANGELOG.md Outdated

Comment thread pkg/dao/adapter_status.go

Comment thread pkg/dao/cluster.go

Comment thread pkg/dao/node_pool.go

Comment thread pkg/services/adapter_status.go

Comment thread pkg/services/aggregation.go Outdated

Comment thread pkg/services/node_pool.go

HYPERFLEET-854 - fix: changelog links

582e642

kuudori reviewed Apr 30, 2026

View reviewed changes

Comment thread pkg/dao/node_pool.go

Comment thread pkg/services/cluster.go Outdated

Comment thread pkg/services/node_pool.go Outdated

mliptak0 force-pushed the hyperfleet-api-854 branch from 2a286ce to 66a54b3 Compare May 4, 2026 05:58

rh-amarin reviewed May 4, 2026

View reviewed changes

Comment thread pkg/services/aggregation.go Outdated

mliptak0 added 4 commits May 4, 2026 09:29

HYPERFLEET-854 - fix: error handling and sql query optimisation

1e9c2eb

HYPERFLEET-854 - fix: check observed generation before marking adapte…

764bca5

…rs as finalized

HYPERFLEET-854 - fix: lint

233c18e

HYPERFLEET-854 - fix: Use getForUpdate for soft-delete to prevent acc…

ca8bade

…idental overwrites

mliptak0 force-pushed the hyperfleet-api-854 branch from f94e8bf to ca8bade Compare May 4, 2026 07:29

HYPERFLEET-854 - fix: revert ExistsByOwner to GORM Count instead of r…

340e24d

…aw SQL Co-Authored-By: Claude <noreply@anthropic.com>

rh-amarin reviewed May 4, 2026

View reviewed changes

Comment thread pkg/services/aggregation.go Outdated

HYPERFLEET-854 - fix: simplify allAdaptersFinalized

3a95a96

rh-amarin reviewed May 4, 2026

View reviewed changes

mliptak0 added 2 commits May 4, 2026 12:40

HYPERFLEET-854 - refactor: simplify computeReconciled for readability

3a797e9

HYPERFLEET-854 - refactor: rename local var allAdaptersReady -> allAd…

6e3acf7

…aptersConditionMet

openshift-ci Bot assigned rh-amarin May 4, 2026

openshift-ci Bot added the lgtm label May 4, 2026

openshift-ci Bot added the approved label May 4, 2026

openshift-merge-bot Bot merged commit 9798ec3 into openshift-hyperfleet:main May 4, 2026
9 checks passed

Conversation

mliptak0 commented Apr 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mliptak0 commented May 4, 2026

Uh oh!

Uh oh!

rh-amarin May 4, 2026

Choose a reason for hiding this comment

Uh oh!

rh-amarin commented May 4, 2026

Uh oh!

openshift-ci Bot commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mliptak0 commented Apr 30, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading