HYPERFLEET-854 - feat: implement hard deletion for clusters and nodepools#119
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR implements hard-delete semantics for Clusters and NodePools: after a soft-deleted resource's required adapters report Finalized=True and no child resources remain, the resource and its adapter status rows are deleted. Adds AdapterStatus.IsFinalized(), refactors AdapterStatusDao.Upsert to accept a pre-fetched existing record and adds DeleteByResource, adds Cluster/NodePool DAO methods GetForUpdate and SaveStatusConditions, moves LastTransitionTime preservation into services, introduces a WaitingForChildResources reconciled state during deletion, and adds unit and integration tests for these behaviors. Sequence Diagram(s)sequenceDiagram
participant Client
participant ClusterService
participant ClusterDAO
participant AdapterStatusDAO
participant NodePoolDAO
participant DB
rect rgba(100, 150, 200, 0.5)
Note over Client,DB: Cluster adapter status processing + potential hard-delete
end
Client->>ClusterService: ProcessAdapterStatus(clusterID, adapterStatus)
activate ClusterService
ClusterService->>ClusterDAO: GetForUpdate(clusterID)
ClusterDAO->>DB: SELECT ... FOR UPDATE
DB-->>ClusterDAO: cluster row
ClusterDAO-->>ClusterService: *Cluster
ClusterService->>AdapterStatusDAO: FindByResource(clusterID)
AdapterStatusDAO->>DB: SELECT adapter_statuses WHERE resource_id=?
DB-->>AdapterStatusDAO: adapter statuses
AdapterStatusDAO-->>ClusterService: []AdapterStatus
ClusterService->>ClusterService: validateAndClassify(adapterStatus)
ClusterService->>AdapterStatusDAO: Upsert(adapterStatus, existing)
AdapterStatusDAO->>DB: INSERT/UPDATE adapter_statuses
DB-->>AdapterStatusDAO: upsert result
AdapterStatusDAO-->>ClusterService: *AdapterStatus
alt all required adapters finalized AND NOT NodePoolDAO.ExistsByOwner(clusterID)
ClusterService->>AdapterStatusDAO: DeleteByResource(cluster, clusterID)
AdapterStatusDAO->>DB: DELETE FROM adapter_statuses WHERE resource_id=?
DB-->>AdapterStatusDAO: success
ClusterService->>ClusterDAO: Delete(clusterID)
ClusterDAO->>DB: DELETE FROM clusters WHERE id=?
DB-->>ClusterDAO: success
ClusterService-->>Client: cluster hard-deleted
else
ClusterService->>ClusterService: recomputeAndSaveClusterStatus(cluster, adapterStatuses)
ClusterService->>ClusterDAO: SaveStatusConditions(clusterID, conditions)
ClusterDAO->>DB: UPDATE clusters SET status_conditions=?
DB-->>ClusterDAO: success
ClusterService-->>Client: cluster still exists (soft-deleted or waiting)
end
deactivate ClusterService
sequenceDiagram
participant Aggregator
participant Caller
rect rgba(150, 100, 150, 0.5)
Note over Caller,Aggregator: Reconciled computation during deletion
end
Caller->>Aggregator: computeReconciled(adapterStatuses, deletedTime!=nil, hasChildResources)
activate Aggregator
Aggregator->>Aggregator: allFinalized := allAdaptersFinalized(...)
alt deletedTime != nil AND allFinalized AND hasChildResources == true
Aggregator-->>Caller: Reconciled=False (Reason=WaitingForChildResources)
else deletedTime != nil AND allFinalized AND hasChildResources == false
Aggregator-->>Caller: Reconciled=True (Reason=AllAdaptersReconciled)
else
Aggregator-->>Caller: Reconciled=False (Reason=MissingRequiredAdapters)
end
deactivate Aggregator
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CHANGELOG.md`:
- Around line 12-13: The changelog contains invalid PR links using
"/pull/HYPERFLEET-854" which must be replaced with the numeric PR path
"/pull/119"; update both occurrences in CHANGELOG.md (the two lines referencing
HYPERFLEET-854 and the additional occurrence noted at lines 31-31) so each
markdown link points to
https://github.com/openshift-hyperfleet/hyperfleet-api/pull/119 instead of the
non-numeric form.
In `@pkg/dao/adapter_status.go`:
- Around line 98-100: When updateResult.RowsAffected == 0 in the adapter status
update flow, explicitly distinguish a stale-report (concurrent version mismatch)
from a row that was deleted between read and update: after detecting
RowsAffected == 0, re-query the DB for the row using the same key/ID from
existing (e.g., existing.ID) — if the re-read returns no row, return a not-found
error (or trigger recreate logic) instead of returning existing; if the re-read
returns a row whose version/fields differ from existing, treat that as a stale
update and return existing (or a conflict error) so callers know the update
didn’t persist. Ensure this logic sits where updateResult.RowsAffected is
checked (the function handling the update in adapter_status.go).
In `@pkg/dao/cluster.go`:
- Around line 97-105: In SaveStatusConditions
(sqlClusterDao.SaveStatusConditions) add a check after the Update to treat zero
affected rows as not-found: if result.RowsAffected == 0, call
db.MarkForRollback(ctx, gorm.ErrRecordNotFound) (or sql.ErrNoRows) and return an
appropriate not-found error instead of nil; keep the existing error handling for
result.Error unchanged so true DB errors still roll back.
In `@pkg/dao/node_pool.go`:
- Around line 60-68: In SaveStatusConditions, don't treat a zero-row update as
success: after calling g2.Model(&api.NodePool{}).Where(...).Update(...) inspect
result.RowsAffected and if it is 0 return a not-found error (and call
db.MarkForRollback(ctx, err) as appropriate) instead of returning nil; keep
existing behavior for result.Error non-nil. Use the SaveStatusConditions
function and result.RowsAffected to implement this check so concurrent
hard-delete/update races surface as not-found.
In `@pkg/services/adapter_status.go`:
- Around line 125-130: The lookup code around
s.adapterStatusDao.FindByResourceAndAdapter currently treats any error as a "not
found" by setting existing = nil, which can hide real DAO failures before
calling Upsert; update the logic in the function to only normalize to nil when
the DAO returns the explicit not-found sentinel (e.g., ErrNotFound or the dao's
not-found behavior) and to return the lookup error immediately for any other
error; locate the call to s.adapterStatusDao.FindByResourceAndAdapter(ctx,
adapterStatus.ResourceType, adapterStatus.ResourceID, adapterStatus.Adapter) and
replace the unconditional existing = nil on error with a conditional that checks
the error value/type and either sets existing = nil for not-found or returns the
error so Upsert sees only valid existing state.
In `@pkg/services/aggregation.go`:
- Around line 740-754: The helper allAdaptersFinalized currently only checks
adapterStatus.IsFinalized() and thus considers finalization from any generation;
change it to require Finalized==true at the current resource generation by
adding a generation parameter (e.g., currentGeneration int64) and only count
adapterStatus entries whose Generation equals that parameter and whose
IsFinalized() is true; update callers (cluster/nodepool hard-delete call sites)
to pass the current resource generation, mirroring the generation filtering
approach used by computeReconciled, so older finalization rows are ignored.
In `@pkg/services/node_pool.go`:
- Around line 272-279: After a successful hard delete in the node-pool deletion
path (when nodePool.DeletedTime != nil and tryHardDeleteNodePool returns true),
ensure we re-check/requeue the parent cluster using nodePool.OwnerID so the
cluster's hard-delete/status is recomputed now that the final child is gone;
update the same logic in the other occurrence (around lines 381-403) to also
trigger the parent cluster reconciliation instead of returning immediately (use
the existing cluster-reconciliation/enqueue helper on the service, e.g., the
method responsible for cluster status recomputation or enqueuing by ID).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: cdeffff4-37b5-409f-b5f2-8265a9d7ff7a
📒 Files selected for processing (17)
CHANGELOG.mdpkg/api/adapter_status_types.gopkg/dao/adapter_status.gopkg/dao/cluster.gopkg/dao/mocks/cluster.gopkg/dao/mocks/node_pool.gopkg/dao/node_pool.gopkg/services/adapter_status.gopkg/services/aggregation.gopkg/services/aggregation_test.gopkg/services/cluster.gopkg/services/cluster_test.gopkg/services/node_pool.gopkg/services/node_pool_test.gopkg/services/status_helpers.gotest/integration/clusters_test.gotest/integration/node_pools_test.go
2a286ce to
66a54b3
Compare
|
/retest |
…idental overwrites
f94e8bf to
ca8bade
Compare
…aw SQL Co-Authored-By: Claude <noreply@anthropic.com>
| allFinalizedButChildrenExist := deletedTime != nil && allAtCurrent && hasChildResources | ||
|
|
||
| status := api.ConditionFalse | ||
| if len(required) > 0 && allAtCurrent { | ||
| if len(required) > 0 && allAtCurrent && !allFinalizedButChildrenExist { | ||
| status = api.ConditionTrue | ||
| } |
There was a problem hiding this comment.
I find this a bit hard to understand because of the negative logic
If we recap, the condition for Reconciled True is:
- All reports at current generation
- When deleting, there should be no children
Would something like
if len(required) > 0 && allAtCurrent && (
deletedTime == nil || deletedTime!=nil && !hasChildResources
){
status = api.ConditionTrue
}
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rh-amarin The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
9798ec3
into
openshift-hyperfleet:main
implement hard deletion for clusters and nodepools
Summary
Test Plan
make test-allpassesmake lintpassesmake test-helm(if applicable)Summary by CodeRabbit
New Features
Changed
Documentation
Tests