Skip to content

Conversation

@CrystalChun
Copy link
Contributor

BMH owns the PreprovisioningImage (PPI), and foreground deletion waits for all children resources (PPI) to be deleted before it removes the owner (BMH)

With the PreprovisioningImage finalizer, there is a possibly for a deadlock to happen when foreground deletion is done.

The BMH (with foreground deletion) waits for the PPI to be deleted, however, previously, the PPI ends up waiting for the BMH to be deleted before its finalizer can be removed.

To prevent this, we'll rely on the state of the BMH to remove the finalizer.

If the BMH has finished deprovisioning (state powering off before deleting or deleting) then we assume we can safely remove the PPI finalizer as the image is no longer needed.

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested): Delete BMH w/ metadata cleaning and foreground deletion. Observe it gets removed
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

/cc @carbonin

@openshift-ci openshift-ci bot requested a review from carbonin December 19, 2025 18:34
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 19, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 19, 2025

@CrystalChun: This pull request references ACM-27859 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

BMH owns the PreprovisioningImage (PPI), and foreground deletion waits for all children resources (PPI) to be deleted before it removes the owner (BMH)

With the PreprovisioningImage finalizer, there is a possibly for a deadlock to happen when foreground deletion is done.

The BMH (with foreground deletion) waits for the PPI to be deleted, however, previously, the PPI ends up waiting for the BMH to be deleted before its finalizer can be removed.

To prevent this, we'll rely on the state of the BMH to remove the finalizer.

If the BMH has finished deprovisioning (state powering off before deleting or deleting) then we assume we can safely remove the PPI finalizer as the image is no longer needed.

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested): Delete BMH w/ metadata cleaning and foreground deletion. Observe it gets removed
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

/cc @carbonin

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link

coderabbitai bot commented Dec 19, 2025

Walkthrough

The PreprovisioningImage deletion handler now checks the associated BareMetalHost's provisioning state and only requeues deletion while the host is still deprovisioning; finalizer removal proceeds only when the host is deprovisioned. Tests were added to verify deletion allowed after deprovisioning completes.

Changes

Cohort / File(s) Summary
Controller logic modification
internal/controller/controllers/preprovisioningimage_controller.go
Adjusts handlePreprovisioningImageDeletion to compute alreadyDeprovisioned from the BMH provisioning state (including PoweringOffBeforeDelete and Deprovisioning), requeue when not deprovisioned, and gate finalizer removal on deprovisioning completion. Updates logs and conditional flow.
Test additions & updates
internal/controller/controllers/preprovisioningimage_controller_test.go
Adds two tests under "PreprovisioningImage deletion protection" verifying deletion is permitted when the BMH reaches Deprovisioning or PoweringOffBeforeDelete. Updates existing deletion-path setup to set BMH status provisioning state to StateDeprovisioning before deletion flow.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Inspect alreadyDeprovisioned logic to ensure all intended BMH states are covered (including PoweringOffBeforeDelete vs Deprovisioning).
  • Verify requeue conditions and finalizer removal ordering in handlePreprovisioningImageDeletion.
  • Run and review the two new tests for correctness and flakiness potential.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 5a22d90 and e4ed6fe.

📒 Files selected for processing (2)
  • internal/controller/controllers/preprovisioningimage_controller.go (1 hunks)
  • internal/controller/controllers/preprovisioningimage_controller_test.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/controller/controllers/preprovisioningimage_controller.go
  • internal/controller/controllers/preprovisioningimage_controller_test.go

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 19, 2025
@openshift-ci
Copy link

openshift-ci bot commented Dec 19, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CrystalChun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 19, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/controller/controllers/preprovisioningimage_controller_test.go (1)

1445-1459: Critical: This existing test will fail with the new controller logic.

The test expects PreprovisioningImage deletion to be blocked when the BareMetalHost has cleaning enabled but is not being deleted. However, the new controller logic at line 776 (in preprovisioningimage_controller.go) only requeues when:

if !bmh.DeletionTimestamp.IsZero() && !alreadyDeprovisioned {

This condition requires the BMH to be deleting (!bmh.DeletionTimestamp.IsZero()) for the requeue to occur. When the BMH is not being deleted (as in this test), the condition evaluates to false, causing the finalizer to be removed and allowing PPI deletion.

Expected test result with new code:

  • result.Requeue will be false (not true as the test expects)
  • PPI will be deleted (finalizer removed)
  • Test assertion Expect(result.Requeue).To(BeTrue()) will fail

Resolution required:
Either update this test to reflect the new intended behavior (allowing deletion when BMH is not being deleted), or adjust the controller logic if this blocking behavior should be preserved.

🧹 Nitpick comments (1)
internal/controller/controllers/preprovisioningimage_controller.go (1)

772-780: Consider including StateDeleted in the deprovisioning completion check.

The current logic checks for StatePoweringOffBeforeDelete and StateDeleting to determine if deprovisioning has finished. While these states are correct, you might also want to include StateDeleted for completeness, as it represents a terminal state where the BMH definitely no longer needs the PreprovisioningImage.

However, if the BMH reaches StateDeleted, it's likely about to be removed from the system entirely, making this check somewhat redundant. The current implementation is probably sufficient.

Optional: Include StateDeleted
-		alreadyDeprovisioned := funk.Contains([]metal3_v1alpha1.ProvisioningState{metal3_v1alpha1.StatePoweringOffBeforeDelete, metal3_v1alpha1.StateDeleting}, bmh.Status.Provisioning.State)
+		alreadyDeprovisioned := funk.Contains([]metal3_v1alpha1.ProvisioningState{metal3_v1alpha1.StatePoweringOffBeforeDelete, metal3_v1alpha1.StateDeleting, metal3_v1alpha1.StateDeleted}, bmh.Status.Provisioning.State)
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between c6921fa and 5a22d90.

📒 Files selected for processing (2)
  • internal/controller/controllers/preprovisioningimage_controller.go (1 hunks)
  • internal/controller/controllers/preprovisioningimage_controller_test.go (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • internal/controller/controllers/preprovisioningimage_controller_test.go
  • internal/controller/controllers/preprovisioningimage_controller.go
🔇 Additional comments (3)
internal/controller/controllers/preprovisioningimage_controller_test.go (2)

405-423: LGTM! Test correctly verifies deletion when BMH finishes deprovisioning.

The test properly validates that when a BareMetalHost reaches the StateDeleting provisioning state (indicating deprovisioning has completed), the PreprovisioningImage finalizer is removed and deletion proceeds. This aligns with the controller logic changes.


425-443: LGTM! Test correctly verifies deletion when BMH enters powering-off state.

The test properly validates that when a BareMetalHost reaches the StatePoweringOffBeforeDelete provisioning state, the PreprovisioningImage finalizer is removed and deletion proceeds. This aligns with the controller logic changes.

internal/controller/controllers/preprovisioningimage_controller.go (1)

772-780: LGTM! Logic correctly addresses the foreground deletion deadlock.

The new deprovisioning-aware deletion logic properly solves the deadlock scenario described in the PR objectives:

  1. During BMH foreground deletion: PPI finalizer is retained while BMH is still deprovisioning (needs the image), but removed once BMH reaches terminal states (PoweringOffBeforeDelete or Deleting)
  2. When BMH is not being deleted: PPI finalizer is removed immediately, allowing independent PPI deletion
  3. When cleaning is disabled: PPI finalizer is removed immediately

This ensures the PreprovisioningImage can be deleted without creating a circular dependency with its owning BareMetalHost.

…H finishes deprovisioning

BMH owns the PreprovisioningImage (PPI), and foreground deletion
waits for all children resources (PPI) to be deleted before
it removes the owner (BMH)

With the PreprovisioningImage finalizer, there is a possibly
for a deadlock to happen when foreground deletion is done.

The BMH (with foreground deletion) waits for the PPI to be
deleted, however, previously, the PPI ends up waiting for the
BMH to be deleted before its finalizer can be removed.

To prevent this, we'll rely on the state of the BMH to
remove the finalizer.

If the BMH has finished deprovisioning (state powering off
before deleting or deleting) then we assume we can safely
remove the PPI finalizer as the image is no longer needed.
@codecov
Copy link

codecov bot commented Dec 19, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 43.49%. Comparing base (c6921fa) to head (e4ed6fe).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #8604   +/-   ##
=======================================
  Coverage   43.49%   43.49%           
=======================================
  Files         411      411           
  Lines       71244    71250    +6     
=======================================
+ Hits        30987    30992    +5     
- Misses      37497    37498    +1     
  Partials     2760     2760           
Files with missing lines Coverage Δ
...ler/controllers/preprovisioningimage_controller.go 80.08% <100.00%> (+0.25%) ⬆️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@CrystalChun
Copy link
Contributor Author

/test e2e-agent-compact-ipv4

2 similar comments
@CrystalChun
Copy link
Contributor Author

/test e2e-agent-compact-ipv4

@gamli75
Copy link
Contributor

gamli75 commented Dec 30, 2025

/test e2e-agent-compact-ipv4

@openshift-ci
Copy link

openshift-ci bot commented Dec 30, 2025

@CrystalChun: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants