Skip to content

HYPERFLEET-854 - feat: implement hard deletion for clusters and nodepools#119

Merged
openshift-merge-bot[bot] merged 10 commits intoopenshift-hyperfleet:mainfrom
mliptak0:hyperfleet-api-854
May 4, 2026
Merged

HYPERFLEET-854 - feat: implement hard deletion for clusters and nodepools#119
openshift-merge-bot[bot] merged 10 commits intoopenshift-hyperfleet:mainfrom
mliptak0:hyperfleet-api-854

Conversation

@mliptak0
Copy link
Copy Markdown
Contributor

@mliptak0 mliptak0 commented Apr 30, 2026

implement hard deletion for clusters and nodepools

Summary

Test Plan

  • Unit tests added/updated
  • make test-all passes
  • make lint passes
  • Helm chart changes validated with make test-helm (if applicable)
  • Deployed to a development cluster and verified (if Helm/config changes)
  • E2E tests passed (if cross-component or major changes)

Summary by CodeRabbit

  • New Features

    • Hard deletion of clusters and node pools after soft-delete once required adapters report Finalized and no child resources remain
    • New intermediate status WaitingForChildResources prevents final reconciliation while child resources still exist
  • Changed

    • Condition aggregation now accounts for child-resource presence during deletion and preserves transition timestamps reliably
    • Adapter-status handling refined to improve consistency around updates and deletions
  • Documentation

    • Changelog entries added describing these behaviors
  • Tests

    • New integration and unit tests covering hard-delete and reconciled-state scenarios

@openshift-ci openshift-ci Bot requested review from aredenba-rh and vkareh April 30, 2026 13:45
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR implements hard-delete semantics for Clusters and NodePools: after a soft-deleted resource's required adapters report Finalized=True and no child resources remain, the resource and its adapter status rows are deleted. Adds AdapterStatus.IsFinalized(), refactors AdapterStatusDao.Upsert to accept a pre-fetched existing record and adds DeleteByResource, adds Cluster/NodePool DAO methods GetForUpdate and SaveStatusConditions, moves LastTransitionTime preservation into services, introduces a WaitingForChildResources reconciled state during deletion, and adds unit and integration tests for these behaviors.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ClusterService
    participant ClusterDAO
    participant AdapterStatusDAO
    participant NodePoolDAO
    participant DB

    rect rgba(100, 150, 200, 0.5)
        Note over Client,DB: Cluster adapter status processing + potential hard-delete
    end

    Client->>ClusterService: ProcessAdapterStatus(clusterID, adapterStatus)
    activate ClusterService

    ClusterService->>ClusterDAO: GetForUpdate(clusterID)
    ClusterDAO->>DB: SELECT ... FOR UPDATE
    DB-->>ClusterDAO: cluster row
    ClusterDAO-->>ClusterService: *Cluster

    ClusterService->>AdapterStatusDAO: FindByResource(clusterID)
    AdapterStatusDAO->>DB: SELECT adapter_statuses WHERE resource_id=?
    DB-->>AdapterStatusDAO: adapter statuses
    AdapterStatusDAO-->>ClusterService: []AdapterStatus

    ClusterService->>ClusterService: validateAndClassify(adapterStatus)
    ClusterService->>AdapterStatusDAO: Upsert(adapterStatus, existing)
    AdapterStatusDAO->>DB: INSERT/UPDATE adapter_statuses
    DB-->>AdapterStatusDAO: upsert result
    AdapterStatusDAO-->>ClusterService: *AdapterStatus

    alt all required adapters finalized AND NOT NodePoolDAO.ExistsByOwner(clusterID)
        ClusterService->>AdapterStatusDAO: DeleteByResource(cluster, clusterID)
        AdapterStatusDAO->>DB: DELETE FROM adapter_statuses WHERE resource_id=?
        DB-->>AdapterStatusDAO: success

        ClusterService->>ClusterDAO: Delete(clusterID)
        ClusterDAO->>DB: DELETE FROM clusters WHERE id=?
        DB-->>ClusterDAO: success

        ClusterService-->>Client: cluster hard-deleted
    else
        ClusterService->>ClusterService: recomputeAndSaveClusterStatus(cluster, adapterStatuses)
        ClusterService->>ClusterDAO: SaveStatusConditions(clusterID, conditions)
        ClusterDAO->>DB: UPDATE clusters SET status_conditions=?
        DB-->>ClusterDAO: success

        ClusterService-->>Client: cluster still exists (soft-deleted or waiting)
    end

    deactivate ClusterService
Loading
sequenceDiagram
    participant Aggregator
    participant Caller

    rect rgba(150, 100, 150, 0.5)
        Note over Caller,Aggregator: Reconciled computation during deletion
    end

    Caller->>Aggregator: computeReconciled(adapterStatuses, deletedTime!=nil, hasChildResources)
    activate Aggregator

    Aggregator->>Aggregator: allFinalized := allAdaptersFinalized(...)
    alt deletedTime != nil AND allFinalized AND hasChildResources == true
        Aggregator-->>Caller: Reconciled=False (Reason=WaitingForChildResources)
    else deletedTime != nil AND allFinalized AND hasChildResources == false
        Aggregator-->>Caller: Reconciled=True (Reason=AllAdaptersReconciled)
    else
        Aggregator-->>Caller: Reconciled=False (Reason=MissingRequiredAdapters)
    end

    deactivate Aggregator
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 61.54% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: implementing hard deletion for clusters and nodepools, which aligns with the substantial refactoring across DAO, service, and test layers described in the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Around line 12-13: The changelog contains invalid PR links using
"/pull/HYPERFLEET-854" which must be replaced with the numeric PR path
"/pull/119"; update both occurrences in CHANGELOG.md (the two lines referencing
HYPERFLEET-854 and the additional occurrence noted at lines 31-31) so each
markdown link points to
https://github.com/openshift-hyperfleet/hyperfleet-api/pull/119 instead of the
non-numeric form.

In `@pkg/dao/adapter_status.go`:
- Around line 98-100: When updateResult.RowsAffected == 0 in the adapter status
update flow, explicitly distinguish a stale-report (concurrent version mismatch)
from a row that was deleted between read and update: after detecting
RowsAffected == 0, re-query the DB for the row using the same key/ID from
existing (e.g., existing.ID) — if the re-read returns no row, return a not-found
error (or trigger recreate logic) instead of returning existing; if the re-read
returns a row whose version/fields differ from existing, treat that as a stale
update and return existing (or a conflict error) so callers know the update
didn’t persist. Ensure this logic sits where updateResult.RowsAffected is
checked (the function handling the update in adapter_status.go).

In `@pkg/dao/cluster.go`:
- Around line 97-105: In SaveStatusConditions
(sqlClusterDao.SaveStatusConditions) add a check after the Update to treat zero
affected rows as not-found: if result.RowsAffected == 0, call
db.MarkForRollback(ctx, gorm.ErrRecordNotFound) (or sql.ErrNoRows) and return an
appropriate not-found error instead of nil; keep the existing error handling for
result.Error unchanged so true DB errors still roll back.

In `@pkg/dao/node_pool.go`:
- Around line 60-68: In SaveStatusConditions, don't treat a zero-row update as
success: after calling g2.Model(&api.NodePool{}).Where(...).Update(...) inspect
result.RowsAffected and if it is 0 return a not-found error (and call
db.MarkForRollback(ctx, err) as appropriate) instead of returning nil; keep
existing behavior for result.Error non-nil. Use the SaveStatusConditions
function and result.RowsAffected to implement this check so concurrent
hard-delete/update races surface as not-found.

In `@pkg/services/adapter_status.go`:
- Around line 125-130: The lookup code around
s.adapterStatusDao.FindByResourceAndAdapter currently treats any error as a "not
found" by setting existing = nil, which can hide real DAO failures before
calling Upsert; update the logic in the function to only normalize to nil when
the DAO returns the explicit not-found sentinel (e.g., ErrNotFound or the dao's
not-found behavior) and to return the lookup error immediately for any other
error; locate the call to s.adapterStatusDao.FindByResourceAndAdapter(ctx,
adapterStatus.ResourceType, adapterStatus.ResourceID, adapterStatus.Adapter) and
replace the unconditional existing = nil on error with a conditional that checks
the error value/type and either sets existing = nil for not-found or returns the
error so Upsert sees only valid existing state.

In `@pkg/services/aggregation.go`:
- Around line 740-754: The helper allAdaptersFinalized currently only checks
adapterStatus.IsFinalized() and thus considers finalization from any generation;
change it to require Finalized==true at the current resource generation by
adding a generation parameter (e.g., currentGeneration int64) and only count
adapterStatus entries whose Generation equals that parameter and whose
IsFinalized() is true; update callers (cluster/nodepool hard-delete call sites)
to pass the current resource generation, mirroring the generation filtering
approach used by computeReconciled, so older finalization rows are ignored.

In `@pkg/services/node_pool.go`:
- Around line 272-279: After a successful hard delete in the node-pool deletion
path (when nodePool.DeletedTime != nil and tryHardDeleteNodePool returns true),
ensure we re-check/requeue the parent cluster using nodePool.OwnerID so the
cluster's hard-delete/status is recomputed now that the final child is gone;
update the same logic in the other occurrence (around lines 381-403) to also
trigger the parent cluster reconciliation instead of returning immediately (use
the existing cluster-reconciliation/enqueue helper on the service, e.g., the
method responsible for cluster status recomputation or enqueuing by ID).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: cdeffff4-37b5-409f-b5f2-8265a9d7ff7a

📥 Commits

Reviewing files that changed from the base of the PR and between 3622815 and a7969d3.

📒 Files selected for processing (17)
  • CHANGELOG.md
  • pkg/api/adapter_status_types.go
  • pkg/dao/adapter_status.go
  • pkg/dao/cluster.go
  • pkg/dao/mocks/cluster.go
  • pkg/dao/mocks/node_pool.go
  • pkg/dao/node_pool.go
  • pkg/services/adapter_status.go
  • pkg/services/aggregation.go
  • pkg/services/aggregation_test.go
  • pkg/services/cluster.go
  • pkg/services/cluster_test.go
  • pkg/services/node_pool.go
  • pkg/services/node_pool_test.go
  • pkg/services/status_helpers.go
  • test/integration/clusters_test.go
  • test/integration/node_pools_test.go

Comment thread CHANGELOG.md Outdated
Comment thread pkg/dao/adapter_status.go
Comment thread pkg/dao/cluster.go
Comment thread pkg/dao/node_pool.go
Comment thread pkg/services/adapter_status.go
Comment thread pkg/services/aggregation.go Outdated
Comment thread pkg/services/node_pool.go
Comment thread pkg/dao/node_pool.go
Comment thread pkg/services/cluster.go Outdated
Comment thread pkg/services/node_pool.go Outdated
@mliptak0 mliptak0 force-pushed the hyperfleet-api-854 branch from 2a286ce to 66a54b3 Compare May 4, 2026 05:58
Comment thread pkg/services/aggregation.go Outdated
@mliptak0
Copy link
Copy Markdown
Contributor Author

mliptak0 commented May 4, 2026

/retest

@mliptak0 mliptak0 force-pushed the hyperfleet-api-854 branch from f94e8bf to ca8bade Compare May 4, 2026 07:29
…aw SQL

Co-Authored-By: Claude <noreply@anthropic.com>
Comment thread pkg/services/aggregation.go Outdated
Comment thread pkg/services/aggregation.go Outdated
Comment on lines 336 to 341
allFinalizedButChildrenExist := deletedTime != nil && allAtCurrent && hasChildResources

status := api.ConditionFalse
if len(required) > 0 && allAtCurrent {
if len(required) > 0 && allAtCurrent && !allFinalizedButChildrenExist {
status = api.ConditionTrue
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this a bit hard to understand because of the negative logic
If we recap, the condition for Reconciled True is:

  • All reports at current generation
  • When deleting, there should be no children

Would something like

if len(required) > 0 && allAtCurrent &&  ( 
  deletedTime == nil || deletedTime!=nil && !hasChildResources
 ){
		status = api.ConditionTrue
	}

@rh-amarin
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 4, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rh-amarin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved label May 4, 2026
@openshift-merge-bot openshift-merge-bot Bot merged commit 9798ec3 into openshift-hyperfleet:main May 4, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants