MGMT-22278: Skip deleting spoke resources for an uninstalled agent #8608

CrystalChun · 2025-12-22T18:23:48Z

Previously, when an Agent starts installation and hasn't completed installing, and a user comes in and tries to delete it, the Agent will be stuck deleting because spoke resource deletion will fail.

This gates the spoke cleanup process with the Agent's current status. The Agent must have spoke resources that need to be removed or is installed.

If the Agent does not have any spoke resources or is not installed, then the spoke cleanup process will be skipped.

List all the issues related to this PR

What environments does this code impact?

Automation (CI, tools, etc)
Cloud
Operator Managed Deployments
None

How was this code tested?

assisted-test-infra environment
dev-scripts environment
Reviewer's test appreciated
Waiting for CI to do a full test run
Manual (Elaborate on how it was tested)
No tests needed

Manual Testing

Recreate customer scenario

Infraenv w/ 1 BMH and 1 agent
Start installing agent
Delete BMH, Agent, and InfraEnv
Should delete all resources successfully

Additional functionality testing

Agent that is installing with a BMH created should have its BMH (and Node) deleted from the spoke cluster when it's deleted and should be deleted successfully afterwards
Agent that is installing with CSRs approved should have its Node deleted from the spoke cluster and should be deleted successfully afterwards

Regression testing

Agent that has not been bound to a cluster and has not started installation should be deleted successfully
Agent that has installed (with InfraEnv and Cluster not deleted) should have its spoke resources removed and be deleted successfully
Agent that has installed and needs spoke resource removal, but assisted does not have spoke client access (fails to delete spoke resources) should not be deleted
Agent that has installed (with InfraEnv deleted, but Cluster not deleted) should not have its spoke resources removed and be deleted successfully
Agent that has installed (with InfraEnv not deleted, but Cluster deleted) should not have its spoke resources removed and be deleted successfully
Agent that has installed (with InfraEnv and Cluster not deleted) with skip spoke cleanup annotation should not have its spoke resources removed and be deleted successfully

Checklist

Title and description added to both, commit and PR.
Relevant issues have been associated (see CONTRIBUTING guide)
This change does not require a documentation update (docstring, docs, README, etc)
Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

Are the title and description (in both PR and commit) meaningful and clear?
Is there a bug required (and linked) for this change?
Should this PR be backported?

openshift-ci · 2025-12-22T18:23:51Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci-robot · 2025-12-22T18:23:52Z

coderabbitai · 2025-12-22T18:23:53Z

Walkthrough

Adds getBMH and spokeResourcesExist helpers; refactors status handling to use a local return value with deferred status patching; finalizer now conditions spoke cleanup on spoke/cluster/infra presence and initializes spoke client only when cleanup will run; tests extended for mid-install, CSR, and BMH cleanup paths.

Changes

Cohort / File(s)	Summary
Agent controller (production) `internal/controller/controllers/agent_controller.go`	Added `getBMH(ctx, agent) (*bmh_v1alpha1.BareMetalHost, error)` and `spokeResourcesExist(ctx, agent) (bool, error)`; updated `bmhExists` to delegate to `getBMH`; `spokeResourcesExist` returns true for Done stage, spoke-annotated BMH, or approved CSRs. Refactored `updateStatus` to use a local `ret` with deferred status patching, integrated CSR approval handling, node retrieval, day-2 flows, event URL population, and uses presence checks (`spokeResourcesExist`, `clusterExists`, `infraEnvExists`) to gate spoke cleanup and spoke client initialization in finalization.
Agent controller tests `internal/controller/controllers/agent_controller_test.go`	Added and adjusted tests for finalizer/cleanup behavior: not removing node when agent not installed; removal during mid-install when CSRs are approved; removal of BMH when annotated as spoke-created; set up `HostStageDone` in terminal cleanup flows and updated failure-path expectations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 274af1a and 7183461.

📒 Files selected for processing (2)

internal/controller/controllers/agent_controller.go
internal/controller/controllers/agent_controller_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

internal/controller/controllers/agent_controller_test.go

🧰 Additional context used

📓 Path-based instructions (1)

**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

internal/controller/controllers/agent_controller.go

🔇 Additional comments (5)

internal/controller/controllers/agent_controller.go (5)

511-547: LGTM! The spoke cleanup gating logic is well-structured.

The conditional checks correctly gate spoke resource cleanup on:

spokeResourcesExist being true

Skip annotation not present

Agent bound to a cluster

Both cluster and infraEnv still existing

Moving the spoke client initialization inside the conditional block is a good optimization.

665-687: Clean refactor of BMH lookup logic.

The separation into getBMH returning the object and bmhExists as a convenience wrapper is well-designed. Returning (nil, nil) for missing label or not-found BMH correctly distinguishes "no BMH" from actual errors.

929-949: Sound logic for detecting spoke resource presence.

The three-pronged check (Done stage, BMH spoke annotation, approved CSRs) correctly identifies scenarios where spoke resources may exist. The len() check on ApprovedCSRs safely handles nil slices.

964-970: Defer pattern correctly propagates patch errors.

The defer properly patches status on function exit and assigns any patch error to the named return err, which will be returned to the caller. This addresses the previous review feedback.

1006-1034: Day-2 flow error handling is appropriately nuanced.

The differentiation between hard errors (propagated via err) and soft errors (returned with nil to trigger requeue without failing reconciliation) is well thought out. CSR approval check failures at lines 1006-1007 correctly allow reconciliation to proceed with a requeue rather than failing.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2025-12-22T18:24:18Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CrystalChun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [CrystalChun]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Previously, when an Agent starts installation and hasn't completed installing, and a user comes in and tries to delete it, the Agent will be stuck deleting because spoke resource deletion will fail. This gates the spoke cleanup process with the Agent's current status. The Agent must have spoke resources that need to be removed or is installed. If the Agent does not have any spoke resources or is not installed, then the spoke cleanup process will be skipped.

CrystalChun · 2025-12-22T18:33:30Z

/test ?

openshift-ci · 2025-12-22T18:33:34Z

@CrystalChun: The following commands are available to trigger required jobs:

/test e2e-agent-compact-ipv4

/test edge-assisted-operator-catalog-publish-verify

/test edge-ci-index

/test edge-e2e-ai-operator-disconnected-capi

/test edge-e2e-ai-operator-ztp

/test edge-e2e-ai-operator-ztp-3masters

/test edge-e2e-ai-operator-ztp-capi

/test edge-e2e-ai-operator-ztp-disconnected

/test edge-e2e-metal-assisted-4-16

/test edge-e2e-metal-assisted-4-17

/test edge-e2e-metal-assisted-4-20

/test edge-e2e-metal-assisted-5-control-planes-4-20

/test edge-e2e-metal-assisted-external-4-20

/test edge-e2e-metal-assisted-lvm-4-20

/test edge-e2e-metal-assisted-none-4-20

/test edge-e2e-metal-assisted-openshift-ai-4-18

/test edge-e2e-metal-assisted-osc-4-20

/test edge-e2e-metal-assisted-virtualization-4-19

/test edge-e2e-metal-assisted-vlan-4-20

/test edge-e2e-vsphere-assisted-4-20

/test edge-images

/test edge-lint

/test edge-operator-publish-verify

/test edge-subsystem-aws

/test edge-subsystem-kubeapi-aws

/test edge-unit-test

/test edge-verify-generated-code

/test images

/test mce-images

/test okd-scos-images

/test verify-deps

The following commands are available to trigger optional jobs:

/test e2e-agent-4control-ipv4

/test e2e-agent-5control-ipv4

/test e2e-agent-ha-dualstack

/test e2e-agent-sno-ipv6

/test edge-e2e-ai-operator-ztp-4masters

/test edge-e2e-ai-operator-ztp-5masters

/test edge-e2e-ai-operator-ztp-compact-day2-masters

/test edge-e2e-ai-operator-ztp-compact-day2-workers

/test edge-e2e-ai-operator-ztp-multiarch-3masters-ocp

/test edge-e2e-ai-operator-ztp-multiarch-sno-ocp

/test edge-e2e-ai-operator-ztp-node-labels

/test edge-e2e-ai-operator-ztp-remove-node

/test edge-e2e-ai-operator-ztp-sno-day2-masters

/test edge-e2e-ai-operator-ztp-sno-day2-workers

/test edge-e2e-ai-operator-ztp-sno-day2-workers-ignitionoverride

/test edge-e2e-ai-operator-ztp-sno-day2-workers-late-binding

/test edge-e2e-metal-assisted-4-control-planes-4-20

/test edge-e2e-metal-assisted-4-masters-none-4-20

/test edge-e2e-metal-assisted-bond-4-20

/test edge-e2e-metal-assisted-day2-4-20

/test edge-e2e-metal-assisted-day2-arm-workers-4-20

/test edge-e2e-metal-assisted-day2-sno-4-20

/test edge-e2e-metal-assisted-dual-primary-v6-compact-4-20

/test edge-e2e-metal-assisted-dual-stack-primary-ipv6-4-20

/test edge-e2e-metal-assisted-ha-kube-api-ipv4-4-20

/test edge-e2e-metal-assisted-ha-kube-api-ipv6-4-20

/test edge-e2e-metal-assisted-ipv4v6-4-20

/test edge-e2e-metal-assisted-ipv6-4-20

/test edge-e2e-metal-assisted-kube-api-late-binding-sno-4-20

/test edge-e2e-metal-assisted-kube-api-late-unbinding-sno-4-20

/test edge-e2e-metal-assisted-kube-api-umlb-4-20

/test edge-e2e-metal-assisted-onprem-4-20

/test edge-e2e-metal-assisted-osc-sno-4-20

/test edge-e2e-metal-assisted-sno-4-20

/test edge-e2e-metal-assisted-static-ip-suite-4-20

/test edge-e2e-metal-assisted-tang-4-20

/test edge-e2e-metal-assisted-tpmv2-4-20

/test edge-e2e-metal-assisted-umlb-4-20

/test edge-e2e-metal-assisted-upgrade-agent-4-20

/test edge-e2e-oci-assisted-4-20

/test edge-e2e-oci-assisted-bm-iscsi-4-20

/test edge-e2e-vsphere-assisted-umlb-4-20

/test edge-e2e-vsphere-assisted-umn-4-20

/test okd-scos-e2e-aws-ovn

/test push-pr-image

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-assisted-service-master-e2e-agent-compact-ipv4

pull-ci-openshift-assisted-service-master-edge-ci-index

pull-ci-openshift-assisted-service-master-edge-e2e-ai-operator-disconnected-capi

pull-ci-openshift-assisted-service-master-edge-e2e-ai-operator-ztp

pull-ci-openshift-assisted-service-master-edge-e2e-ai-operator-ztp-capi

pull-ci-openshift-assisted-service-master-edge-e2e-metal-assisted-4-20

pull-ci-openshift-assisted-service-master-edge-images

pull-ci-openshift-assisted-service-master-edge-lint

pull-ci-openshift-assisted-service-master-edge-subsystem-aws

pull-ci-openshift-assisted-service-master-edge-subsystem-kubeapi-aws

pull-ci-openshift-assisted-service-master-edge-unit-test

pull-ci-openshift-assisted-service-master-edge-verify-generated-code

pull-ci-openshift-assisted-service-master-images

pull-ci-openshift-assisted-service-master-mce-images

pull-ci-openshift-assisted-service-master-okd-scos-images

pull-ci-openshift-assisted-service-master-verify-deps

Details

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

CrystalChun · 2025-12-22T18:33:52Z

/test edge-subsystem-kubeapi-aws

openshift-ci-robot · 2025-12-22T19:06:14Z

openshift-ci-robot · 2025-12-22T19:12:38Z

@CrystalChun: This pull request references MGMT-22278 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Previously, when an Agent starts installation and hasn't completed installing, and a user comes in and tries to delete it, the Agent will be stuck deleting because spoke resource deletion will fail.

This gates the spoke cleanup process with the Agent's current status. The Agent must have spoke resources that need to be removed or is installed.

If the Agent does not have any spoke resources or is not installed, then the spoke cleanup process will be skipped.

List all the issues related to this PR

New Feature

Enhancement

Bug fix

Tests

Documentation

CI/CD

What environments does this code impact?

Automation (CI, tools, etc)

Cloud

Operator Managed Deployments

None

How was this code tested?

assisted-test-infra environment

dev-scripts environment

Reviewer's test appreciated

Waiting for CI to do a full test run

Manual (Elaborate on how it was tested)

No tests needed

Manual Testing

Recreate customer scenario

Infraenv w/ 1 BMH and 1 agent

Start installing agent

Delete BMH, Agent, and InfraEnv

Should delete all resources successfully

Additional functionality testing

Agent that is installing with a BMH created should have its BMH (and Node) deleted from the spoke cluster when it's deleted and should be deleted successfully afterwards

Agent that is installing with CSRs approved should have its Node deleted from the spoke cluster and should be deleted successfully afterwards

Regression testing

Agent that has not been bound to a cluster and has not started installation should be deleted successfully

Agent that has installed (with InfraEnv and Cluster not deleted) should have its spoke resources removed and be deleted successfully

Agent that has installed and needs spoke resource removal, but assisted does not have spoke client access (fails to delete spoke resources) should not be deleted

Agent that has installed (with InfraEnv deleted, but Cluster not deleted) should not have its spoke resources removed and be deleted successfully

Agent that has installed (with InfraEnv not deleted, but Cluster deleted) should not have its spoke resources removed and be deleted successfully

Agent that has installed (with InfraEnv and Cluster not deleted) with skip spoke cleanup annotation should not have its spoke resources removed and be deleted successfully

Checklist

Title and description added to both, commit and PR.

Relevant issues have been associated (see CONTRIBUTING guide)

This change does not require a documentation update (docstring, docs, README, etc)

Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

Are the title and description (in both PR and commit) meaningful and clear?

Is there a bug required (and linked) for this change?

Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-12-22T19:12:43Z

@CrystalChun: This pull request references MGMT-22278 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Previously, when an Agent starts installation and hasn't completed installing, and a user comes in and tries to delete it, the Agent will be stuck deleting because spoke resource deletion will fail.

This gates the spoke cleanup process with the Agent's current status. The Agent must have spoke resources that need to be removed or is installed.

If the Agent does not have any spoke resources or is not installed, then the spoke cleanup process will be skipped.

List all the issues related to this PR

New Feature

Enhancement

Bug fix

Tests

Documentation

CI/CD

What environments does this code impact?

Automation (CI, tools, etc)

Cloud

Operator Managed Deployments

None

How was this code tested?

assisted-test-infra environment

dev-scripts environment

Reviewer's test appreciated

Waiting for CI to do a full test run

Manual (Elaborate on how it was tested)

No tests needed

Manual Testing

Recreate customer scenario

Infraenv w/ 1 BMH and 1 agent

Start installing agent

Delete BMH, Agent, and InfraEnv

Should delete all resources successfully

Additional functionality testing

Agent that is installing with a BMH created should have its BMH (and Node) deleted from the spoke cluster when it's deleted and should be deleted successfully afterwards

Agent that is installing with CSRs approved should have its Node deleted from the spoke cluster and should be deleted successfully afterwards

Regression testing

Agent that has not been bound to a cluster and has not started installation should be deleted successfully

Agent that has installed (with InfraEnv and Cluster not deleted) should have its spoke resources removed and be deleted successfully

Agent that has installed and needs spoke resource removal, but assisted does not have spoke client access (fails to delete spoke resources) should not be deleted

Agent that has installed (with InfraEnv deleted, but Cluster not deleted) should not have its spoke resources removed and be deleted successfully

Agent that has installed (with InfraEnv not deleted, but Cluster deleted) should not have its spoke resources removed and be deleted successfully

Agent that has installed (with InfraEnv and Cluster not deleted) with skip spoke cleanup annotation should not have its spoke resources removed and be deleted successfully

Checklist

Title and description added to both, commit and PR.

Relevant issues have been associated (see CONTRIBUTING guide)

This change does not require a documentation update (docstring, docs, README, etc)

Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

Are the title and description (in both PR and commit) meaningful and clear?

Is there a bug required (and linked) for this change?

Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-12-23T00:36:07Z

@CrystalChun: This pull request references MGMT-22278 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Previously, when an Agent starts installation and hasn't completed installing, and a user comes in and tries to delete it, the Agent will be stuck deleting because spoke resource deletion will fail.

This gates the spoke cleanup process with the Agent's current status. The Agent must have spoke resources that need to be removed or is installed.

If the Agent does not have any spoke resources or is not installed, then the spoke cleanup process will be skipped.

List all the issues related to this PR

New Feature

Enhancement

Bug fix

Tests

Documentation

CI/CD

What environments does this code impact?

Automation (CI, tools, etc)

Cloud

Operator Managed Deployments

None

How was this code tested?

assisted-test-infra environment

dev-scripts environment

Reviewer's test appreciated

Waiting for CI to do a full test run

Manual (Elaborate on how it was tested)

No tests needed

Manual Testing

Recreate customer scenario

Infraenv w/ 1 BMH and 1 agent

Start installing agent

Delete BMH, Agent, and InfraEnv

Should delete all resources successfully

Additional functionality testing

Agent that is installing with a BMH created should have its BMH (and Node) deleted from the spoke cluster when it's deleted and should be deleted successfully afterwards

Agent that is installing with CSRs approved should have its Node deleted from the spoke cluster and should be deleted successfully afterwards

Regression testing

Agent that has not been bound to a cluster and has not started installation should be deleted successfully

Agent that has installed (with InfraEnv and Cluster not deleted) should have its spoke resources removed and be deleted successfully

Agent that has installed and needs spoke resource removal, but assisted does not have spoke client access (fails to delete spoke resources) should not be deleted

Agent that has installed (with InfraEnv deleted, but Cluster not deleted) should not have its spoke resources removed and be deleted successfully

Agent that has installed (with InfraEnv not deleted, but Cluster deleted) should not have its spoke resources removed and be deleted successfully

Agent that has installed (with InfraEnv and Cluster not deleted) with skip spoke cleanup annotation should not have its spoke resources removed and be deleted successfully

Checklist

Title and description added to both, commit and PR.

Relevant issues have been associated (see CONTRIBUTING guide)

This change does not require a documentation update (docstring, docs, README, etc)

Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

Are the title and description (in both PR and commit) meaningful and clear?

Is there a bug required (and linked) for this change?

Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-12-23T00:36:11Z

@CrystalChun: This pull request references MGMT-22278 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Previously, when an Agent starts installation and hasn't completed installing, and a user comes in and tries to delete it, the Agent will be stuck deleting because spoke resource deletion will fail.

This gates the spoke cleanup process with the Agent's current status. The Agent must have spoke resources that need to be removed or is installed.

If the Agent does not have any spoke resources or is not installed, then the spoke cleanup process will be skipped.

List all the issues related to this PR

New Feature

Enhancement

Bug fix

Tests

Documentation

CI/CD

What environments does this code impact?

Automation (CI, tools, etc)

Cloud

Operator Managed Deployments

None

How was this code tested?

assisted-test-infra environment

dev-scripts environment

Reviewer's test appreciated

Waiting for CI to do a full test run

Manual (Elaborate on how it was tested)

No tests needed

Manual Testing

Recreate customer scenario

Infraenv w/ 1 BMH and 1 agent

Start installing agent

Delete BMH, Agent, and InfraEnv

Should delete all resources successfully

Additional functionality testing

Agent that is installing with a BMH created should have its BMH (and Node) deleted from the spoke cluster when it's deleted and should be deleted successfully afterwards

Agent that is installing with CSRs approved should have its Node deleted from the spoke cluster and should be deleted successfully afterwards

Regression testing

Agent that has not been bound to a cluster and has not started installation should be deleted successfully

Agent that has installed (with InfraEnv and Cluster not deleted) should have its spoke resources removed and be deleted successfully

Agent that has installed and needs spoke resource removal, but assisted does not have spoke client access (fails to delete spoke resources) should not be deleted

Agent that has installed (with InfraEnv deleted, but Cluster not deleted) should not have its spoke resources removed and be deleted successfully

Agent that has installed (with InfraEnv not deleted, but Cluster deleted) should not have its spoke resources removed and be deleted successfully

Agent that has installed (with InfraEnv and Cluster not deleted) with skip spoke cleanup annotation should not have its spoke resources removed and be deleted successfully

Checklist

Title and description added to both, commit and PR.

Relevant issues have been associated (see CONTRIBUTING guide)

This change does not require a documentation update (docstring, docs, README, etc)

Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

Are the title and description (in both PR and commit) meaningful and clear?

Is there a bug required (and linked) for this change?

Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-12-23T00:36:16Z

@CrystalChun: This pull request references MGMT-22278 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Previously, when an Agent starts installation and hasn't completed installing, and a user comes in and tries to delete it, the Agent will be stuck deleting because spoke resource deletion will fail.

This gates the spoke cleanup process with the Agent's current status. The Agent must have spoke resources that need to be removed or is installed.

If the Agent does not have any spoke resources or is not installed, then the spoke cleanup process will be skipped.

List all the issues related to this PR

New Feature

Enhancement

Bug fix

Tests

Documentation

CI/CD

What environments does this code impact?

Automation (CI, tools, etc)

Cloud

Operator Managed Deployments

None

How was this code tested?

assisted-test-infra environment

dev-scripts environment

Reviewer's test appreciated

Waiting for CI to do a full test run

Manual (Elaborate on how it was tested)

No tests needed

Manual Testing

Recreate customer scenario

Infraenv w/ 1 BMH and 1 agent

Start installing agent

Delete BMH, Agent, and InfraEnv

Should delete all resources successfully

Additional functionality testing

Agent that is installing with a BMH created should have its BMH (and Node) deleted from the spoke cluster when it's deleted and should be deleted successfully afterwards

Agent that is installing with CSRs approved should have its Node deleted from the spoke cluster and should be deleted successfully afterwards

Regression testing

Agent that has not been bound to a cluster and has not started installation should be deleted successfully

Agent that has installed (with InfraEnv and Cluster not deleted) should have its spoke resources removed and be deleted successfully

Agent that has installed and needs spoke resource removal, but assisted does not have spoke client access (fails to delete spoke resources) should not be deleted

Agent that has installed (with InfraEnv deleted, but Cluster not deleted) should not have its spoke resources removed and be deleted successfully

Agent that has installed (with InfraEnv not deleted, but Cluster deleted) should not have its spoke resources removed and be deleted successfully

Agent that has installed (with InfraEnv and Cluster not deleted) with skip spoke cleanup annotation should not have its spoke resources removed and be deleted successfully

Checklist

Title and description added to both, commit and PR.

Relevant issues have been associated (see CONTRIBUTING guide)

This change does not require a documentation update (docstring, docs, README, etc)

Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

Are the title and description (in both PR and commit) meaningful and clear?

Is there a bug required (and linked) for this change?

Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-12-23T00:36:22Z

@CrystalChun: This pull request references MGMT-22278 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Previously, when an Agent starts installation and hasn't completed installing, and a user comes in and tries to delete it, the Agent will be stuck deleting because spoke resource deletion will fail.

This gates the spoke cleanup process with the Agent's current status. The Agent must have spoke resources that need to be removed or is installed.

If the Agent does not have any spoke resources or is not installed, then the spoke cleanup process will be skipped.

List all the issues related to this PR

New Feature

Enhancement

Bug fix

Tests

Documentation

CI/CD

What environments does this code impact?

Automation (CI, tools, etc)

Cloud

Operator Managed Deployments

None

How was this code tested?

assisted-test-infra environment

dev-scripts environment

Reviewer's test appreciated

Waiting for CI to do a full test run

Manual (Elaborate on how it was tested)

No tests needed

Manual Testing

Recreate customer scenario

Infraenv w/ 1 BMH and 1 agent

Start installing agent

Delete BMH, Agent, and InfraEnv

Should delete all resources successfully

Additional functionality testing

Agent that is installing with a BMH created should have its BMH (and Node) deleted from the spoke cluster when it's deleted and should be deleted successfully afterwards

Agent that is installing with CSRs approved should have its Node deleted from the spoke cluster and should be deleted successfully afterwards

Regression testing

Agent that has not been bound to a cluster and has not started installation should be deleted successfully

Agent that has installed (with InfraEnv and Cluster not deleted) should have its spoke resources removed and be deleted successfully

Agent that has installed and needs spoke resource removal, but assisted does not have spoke client access (fails to delete spoke resources) should not be deleted

Agent that has installed (with InfraEnv deleted, but Cluster not deleted) should not have its spoke resources removed and be deleted successfully

Agent that has installed (with InfraEnv not deleted, but Cluster deleted) should not have its spoke resources removed and be deleted successfully

Agent that has installed (with InfraEnv and Cluster not deleted) with skip spoke cleanup annotation should not have its spoke resources removed and be deleted successfully

Checklist

Title and description added to both, commit and PR.

Relevant issues have been associated (see CONTRIBUTING guide)

This change does not require a documentation update (docstring, docs, README, etc)

Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

Are the title and description (in both PR and commit) meaningful and clear?

Is there a bug required (and linked) for this change?

Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-12-23T00:39:58Z

@CrystalChun: This pull request references MGMT-22278 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Previously, when an Agent starts installation and hasn't completed installing, and a user comes in and tries to delete it, the Agent will be stuck deleting because spoke resource deletion will fail.

This gates the spoke cleanup process with the Agent's current status. The Agent must have spoke resources that need to be removed or is installed.

If the Agent does not have any spoke resources or is not installed, then the spoke cleanup process will be skipped.

List all the issues related to this PR

New Feature

Enhancement

Bug fix

Tests

Documentation

CI/CD

What environments does this code impact?

Automation (CI, tools, etc)

Cloud

Operator Managed Deployments

None

How was this code tested?

assisted-test-infra environment

dev-scripts environment

Reviewer's test appreciated

Waiting for CI to do a full test run

Manual (Elaborate on how it was tested)

No tests needed

Manual Testing

Recreate customer scenario

Infraenv w/ 1 BMH and 1 agent

Start installing agent

Delete BMH, Agent, and InfraEnv

Should delete all resources successfully

Additional functionality testing

Agent that is installing with a BMH created should have its BMH (and Node) deleted from the spoke cluster when it's deleted and should be deleted successfully afterwards

Agent that is installing with CSRs approved should have its Node deleted from the spoke cluster and should be deleted successfully afterwards

Regression testing

Agent that has not been bound to a cluster and has not started installation should be deleted successfully

Agent that has installed (with InfraEnv and Cluster not deleted) should have its spoke resources removed and be deleted successfully

Agent that has installed and needs spoke resource removal, but assisted does not have spoke client access (fails to delete spoke resources) should not be deleted

Agent that has installed (with InfraEnv deleted, but Cluster not deleted) should not have its spoke resources removed and be deleted successfully

Agent that has installed (with InfraEnv not deleted, but Cluster deleted) should not have its spoke resources removed and be deleted successfully

Agent that has installed (with InfraEnv and Cluster not deleted) with skip spoke cleanup annotation should not have its spoke resources removed and be deleted successfully

Checklist

Title and description added to both, commit and PR.

Relevant issues have been associated (see CONTRIBUTING guide)

This change does not require a documentation update (docstring, docs, README, etc)

Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

Are the title and description (in both PR and commit) meaningful and clear?

Is there a bug required (and linked) for this change?

Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-12-23T00:50:56Z

@CrystalChun: This pull request references MGMT-22278 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Previously, when an Agent starts installation and hasn't completed installing, and a user comes in and tries to delete it, the Agent will be stuck deleting because spoke resource deletion will fail.

This gates the spoke cleanup process with the Agent's current status. The Agent must have spoke resources that need to be removed or is installed.

If the Agent does not have any spoke resources or is not installed, then the spoke cleanup process will be skipped.

List all the issues related to this PR

New Feature

Enhancement

Bug fix

Tests

Documentation

CI/CD

What environments does this code impact?

Automation (CI, tools, etc)

Cloud

Operator Managed Deployments

None

How was this code tested?

assisted-test-infra environment

dev-scripts environment

Reviewer's test appreciated

Waiting for CI to do a full test run

Manual (Elaborate on how it was tested)

No tests needed

Manual Testing

Recreate customer scenario

Infraenv w/ 1 BMH and 1 agent

Start installing agent

Delete BMH, Agent, and InfraEnv

Should delete all resources successfully

Additional functionality testing

Agent that is installing with a BMH created should have its BMH (and Node) deleted from the spoke cluster when it's deleted and should be deleted successfully afterwards

Agent that is installing with CSRs approved should have its Node deleted from the spoke cluster and should be deleted successfully afterwards

Regression testing

Agent that has not been bound to a cluster and has not started installation should be deleted successfully

Agent that has installed (with InfraEnv and Cluster not deleted) should have its spoke resources removed and be deleted successfully

Agent that has installed and needs spoke resource removal, but assisted does not have spoke client access (fails to delete spoke resources) should not be deleted

Agent that has installed (with InfraEnv deleted, but Cluster not deleted) should not have its spoke resources removed and be deleted successfully

Agent that has installed (with InfraEnv not deleted, but Cluster deleted) should not have its spoke resources removed and be deleted successfully

Agent that has installed (with InfraEnv and Cluster not deleted) with skip spoke cleanup annotation should not have its spoke resources removed and be deleted successfully

Checklist

Title and description added to both, commit and PR.

Relevant issues have been associated (see CONTRIBUTING guide)

This change does not require a documentation update (docstring, docs, README, etc)

Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

Are the title and description (in both PR and commit) meaningful and clear?

Is there a bug required (and linked) for this change?

Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-12-23T00:50:56Z

@CrystalChun: This pull request references MGMT-22278 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Previously, when an Agent starts installation and hasn't completed installing, and a user comes in and tries to delete it, the Agent will be stuck deleting because spoke resource deletion will fail.

This gates the spoke cleanup process with the Agent's current status. The Agent must have spoke resources that need to be removed or is installed.

If the Agent does not have any spoke resources or is not installed, then the spoke cleanup process will be skipped.

List all the issues related to this PR

New Feature

Enhancement

Bug fix

Tests

Documentation

CI/CD

What environments does this code impact?

Automation (CI, tools, etc)

Cloud

Operator Managed Deployments

None

How was this code tested?

assisted-test-infra environment

dev-scripts environment

Reviewer's test appreciated

Waiting for CI to do a full test run

Manual (Elaborate on how it was tested)

No tests needed

Manual Testing

Recreate customer scenario

Infraenv w/ 1 BMH and 1 agent

Start installing agent

Delete BMH, Agent, and InfraEnv

Should delete all resources successfully

Additional functionality testing

Agent that is installing with a BMH created should have its BMH (and Node) deleted from the spoke cluster when it's deleted and should be deleted successfully afterwards

Agent that is installing with CSRs approved should have its Node deleted from the spoke cluster and should be deleted successfully afterwards

Regression testing

Agent that has not been bound to a cluster and has not started installation should be deleted successfully

Agent that has installed (with InfraEnv and Cluster not deleted) should have its spoke resources removed and be deleted successfully

Agent that has installed and needs spoke resource removal, but assisted does not have spoke client access (fails to delete spoke resources) should not be deleted

Agent that has installed (with InfraEnv deleted, but Cluster not deleted) should not have its spoke resources removed and be deleted successfully

Agent that has installed (with InfraEnv not deleted, but Cluster deleted) should not have its spoke resources removed and be deleted successfully

Agent that has installed (with InfraEnv and Cluster not deleted) with skip spoke cleanup annotation should not have its spoke resources removed and be deleted successfully

Checklist

Title and description added to both, commit and PR.

Relevant issues have been associated (see CONTRIBUTING guide)

This change does not require a documentation update (docstring, docs, README, etc)

Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

Are the title and description (in both PR and commit) meaningful and clear?

Is there a bug required (and linked) for this change?

Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

codecov · 2025-12-23T19:34:12Z

Codecov Report

❌ Patch coverage is 53.19149% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 43.50%. Comparing base (c6921fa) to head (7183461).
⚠️ Report is 4 commits behind head on master.

Files with missing lines	Patch %	Lines
...nternal/controller/controllers/agent_controller.go	53.19%	17 Missing and 5 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #8608      +/-   ##
==========================================
+ Coverage   43.49%   43.50%   +0.01%     
==========================================
  Files         411      411              
  Lines       71244    71271      +27     
==========================================
+ Hits        30987    31010      +23     
- Misses      37497    37501       +4     
  Partials     2760     2760

Files with missing lines	Coverage Δ
...nternal/controller/controllers/agent_controller.go	`76.59% <53.19%> (-0.26%)`	⬇️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

internal/controller/controllers/agent_controller.go (1)

929-949: Error handling already mitigates transient failures appropriately.

The concern about getBMH errors blocking deletion is valid. However, the code already provides safeguards: client.IgnoreNotFound() filters out NotFound errors (lines 675-676), errors are logged with context (line 513), and the requeue mechanism enables automatic retry. This conservative error-handling pattern is consistent across all pre-deletion checks and intentionally prioritizes preventing incomplete cleanup over avoiding temporary deletion delays. If persistent transient errors become a production issue, metrics on error frequency would be a useful optional enhancement, but the current logging provides adequate visibility.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 1554411 and baac52b.

📒 Files selected for processing (1)

internal/controller/controllers/agent_controller.go

🧰 Additional context used

📓 Path-based instructions (1)

**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

internal/controller/controllers/agent_controller.go

🧬 Code graph analysis (1)

internal/controller/controllers/agent_controller.go (3)

api/v1beta1/agent_types.go (1)

CSRStatus (324-330)

internal/controller/controllers/bmh_agent_controller.go (2)

AGENT_BMH_LABEL (81-81)

BMH_SPOKE_CREATED_ANNOTATION (99-99)

models/host_stage.go (1)

HostStageDone (67-67)

🔇 Additional comments (2)

internal/controller/controllers/agent_controller.go (2)

665-687: LGTM! Clean helper extraction.

The getBMH helper properly handles the case where no BMH label exists and uses client.IgnoreNotFound to treat "not found" as a non-error. Refactoring bmhExists to delegate to getBMH eliminates duplication.

511-517: LGTM! Cleanup gating aligns with PR objectives.

The spoke cleanup is now correctly gated on three conditions:

No skip annotation

Agent bound to a cluster

Spoke resources actually exist (per spokeResourcesExist)

This prevents the deletion flow from attempting cleanup when the agent hasn't progressed far enough to create spoke resources, addressing the stuck-deletion scenario described in the PR.

internal/controller/controllers/agent_controller.go

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

internal/controller/controllers/agent_controller.go (1)
963-974: Deferred patch error won't propagate to caller.

The err = patchErr assignment inside the defer won't affect the returned error because Go evaluates return values before executing deferred functions, and the return values here are unnamed. The patch error is logged but silently discarded.

If patch errors should propagate, use named return values:
🔎 Use named return values to propagate patch errors
-func (r *AgentReconciler) updateStatus(ctx context.Context, log logrus.FieldLogger, agent, origAgent *aiv1beta1.Agent, h *models.Host, clusterId *strfmt.UUID, syncErr error, internal bool) (ctrl.Result, error) {
+func (r *AgentReconciler) updateStatus(ctx context.Context, log logrus.FieldLogger, agent, origAgent *aiv1beta1.Agent, h *models.Host, clusterId *strfmt.UUID, syncErr error, internal bool) (ret ctrl.Result, err error) {

 	var (
-		err                   error
 		shouldAutoApproveCSRs bool
 		spokeClient           spoke_k8s_client.SpokeK8sClient
 		node                  *corev1.Node
-		ret                   ctrl.Result = ctrl.Result{}
 	)
Alternatively, if patch errors should only be logged without affecting reconciliation, remove the err = patchErr assignment to avoid confusion about intent.

🧹 Nitpick comments (1)

internal/controller/controllers/agent_controller.go (1)
1057-1059: Setting err = nil before return is redundant.

At line 1058, err = nil is set explicitly, but then line 1059 returns ret, nil directly anyway. The assignment has no effect.
🔎 Simplify by removing redundant assignment
 		if err != nil {
 			ret = ctrl.Result{Requeue: true}
-			err = nil
 			return ret, nil
 		}

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between baac52b and 2ba0d2b.

📒 Files selected for processing (1)

internal/controller/controllers/agent_controller.go

🧰 Additional context used

📓 Path-based instructions (1)

**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

internal/controller/controllers/agent_controller.go

🧬 Code graph analysis (1)

internal/controller/controllers/agent_controller.go (3)

api/v1beta1/agent_types.go (2)

Agent (342-348)

CSRStatus (324-330)

internal/controller/controllers/bmh_agent_controller.go (2)

AGENT_BMH_LABEL (81-81)

BMH_SPOKE_CREATED_ANNOTATION (99-99)

models/host_stage.go (1)

HostStageDone (67-67)

🔇 Additional comments (3)

internal/controller/controllers/agent_controller.go (3)

511-517: LGTM! Proper gating of spoke resource cleanup.

The new spokeResourcesExist check correctly gates the cleanup logic, ensuring spoke resources are only removed when they actually exist. This addresses the PR objective of preventing stuck agent deletions when installation hasn't completed.

665-687: Clean helper extraction.

Good refactor extracting getBMH as a reusable helper that bmhExists and spokeResourcesExist can both leverage. The error handling with client.IgnoreNotFound is appropriate.

929-949: Logic correctly identifies when spoke resources exist.

The three conditions comprehensively cover scenarios where spoke resources would exist:

Agent completed installation (Done stage)

BMH with spoke-created annotation indicates spoke resources were provisioned

Approved CSRs indicate the node is joining/has joined the spoke cluster

This aligns well with the PR objective to skip cleanup for agents that haven't progressed far enough in installation.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

internal/controller/controllers/agent_controller_test.go (1)
5024-5042: Consider testing both CSR types for comprehensive coverage.

The test correctly verifies that node removal proceeds during installation when CSRs are approved (indicating spoke resources exist). However, it only tests with a serving CSR (CSRTypeServing). For more comprehensive coverage, consider adding a test that verifies behavior with both client and serving CSRs, since both types are tracked in agent.Status.CSRStatus.ApprovedCSRs.
🔎 Optional: Test with both CSR types

Consider adding a similar test case that includes both CSR types:
agent.Status.CSRStatus.ApprovedCSRs = []v1beta1.CSRInfo{
    {
        Name:       "csr-host-client",
        Type:       v1beta1.CSRTypeClient,
        ApprovedAt: metav1.Now(),
    },
    {
        Name:       "csr-host-serving",
        Type:       v1beta1.CSRTypeServing,
        ApprovedAt: metav1.Now(),
    },
}
This would verify the cleanup logic handles both CSR types correctly, matching the real-world scenario where both are typically approved during installation.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 2ba0d2b and fc1268f.

📒 Files selected for processing (2)

internal/controller/controllers/agent_controller.go
internal/controller/controllers/agent_controller_test.go

🧰 Additional context used

📓 Path-based instructions (1)

**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

internal/controller/controllers/agent_controller.go
internal/controller/controllers/agent_controller_test.go

🧬 Code graph analysis (2)

internal/controller/controllers/agent_controller.go (3)

api/v1beta1/agent_types.go (2)

Agent (342-348)

CSRStatus (324-330)

internal/controller/controllers/bmh_agent_controller.go (2)

AGENT_BMH_LABEL (81-81)

BMH_SPOKE_CREATED_ANNOTATION (99-99)

models/host_stage.go (1)

HostStageDone (67-67)

internal/controller/controllers/agent_controller_test.go (4)

models/host.go (1)

HostStatusInstalling (635-635)

models/host_stage.go (3)

HostStageRebooting (55-55)

HostStageDone (67-67)

HostStageInstalling (46-46)

api/v1beta1/agent_types.go (3)

CSRStatus (324-330)

CSRInfo (317-321)

CSRTypeServing (313-313)

internal/controller/controllers/bmh_agent_controller.go (2)

AGENT_BMH_LABEL (81-81)

BMH_SPOKE_CREATED_ANNOTATION (99-99)

🔇 Additional comments (10)

internal/controller/controllers/agent_controller.go (5)

665-679: LGTM! Clean helper extraction.

The getBMH helper properly handles the lookup logic with correct error handling: returns nil when no BMH label exists or BMH is not found (both expected scenarios), and propagates other errors appropriately. This extraction improves code readability and reusability.

929-949: Excellent design decision to gate cleanup on resource existence.

This helper correctly determines spoke resource presence by checking three indicators:

Agent installation completed (HostStageDone)

BMH has spoke creation annotation

CSRs have been approved (node joining flow initiated)

This directly addresses the PR objective: preventing stuck Agent deletions when installation started but hasn't progressed far enough to create spoke resources. The logic properly handles uninitialized states (empty stage string ≠ HostStageDone).

511-517: LGTM! Core fix correctly gates spoke cleanup.

The finalizer now properly checks spokeResourcesExist before attempting cleanup, preventing the stuck deletion scenario when an agent started installation but hasn't created spoke resources yet. The condition correctly requires all three: no skip annotation, bound to cluster, and spoke resources exist.

Note: The spoke client initialization on lines 535-543 is wisely deferred until cleanup is confirmed necessary, avoiding unnecessary spoke cluster connections.

961-974: LGTM! Defer refactoring addresses previous review concern.

The updateStatus refactoring consolidates return paths through a single ret variable and ensures status patching happens via defer. The past review concern about defer return signatures has been properly addressed—the defer now assigns to the err variable in the enclosing scope, ensuring patch errors propagate to the caller.

Minor note: If both the function body and the patch operation fail, the patch error will override the original error on line 968. However, both errors are logged, so this trade-off is acceptable for cleaner status patching logic.

681-687: LGTM! Clean refactoring.

The bmhExists method now correctly delegates to the getBMH helper, improving maintainability by centralizing the BMH lookup logic while preserving error propagation.

internal/controller/controllers/agent_controller_test.go (5)

3442-3443: LGTM: Test expectations correctly updated.

The updated expectations now verify that the agent maintains its status (HostStatusInstalling) and stage (HostStageRebooting) even when encountering node retrieval errors, rather than clearing these fields. This aligns with the test setup and reflects proper error handling in the production code.

4786-4786: LGTM: Proper test setup for terminal state.

Setting the agent's current stage to Done in the BeforeEach establishes the baseline that the agent has completed installation before finalization. This is appropriate for most finalizer test scenarios and allows individual tests to override when testing mid-installation cleanup.

4921-4952: LGTM: Test correctly verifies cleanup gating.

This test properly validates that spoke resource cleanup is skipped when an agent is in the middle of installation (HostStageInstalling) without any spoke resources created (no approved CSRs or BMH). The test correctly:

Overrides the BeforeEach stage to simulate mid-installation

Expects no spoke client initialization or node removal operations

Verifies the finalizer completes without attempting spoke cleanup

This aligns with the PR objective to prevent stuck deletions when agents haven't completed installation.

5005-5005: LGTM: Consistent baseline setup.

Setting the agent stage to Done in the nested BeforeEach establishes the baseline for tests using the fake spoke client. This is appropriate since these tests typically verify spoke resource cleanup for completed installations, with individual tests able to override when testing edge cases.

5044-5071: LGTM: Test correctly verifies BMH cleanup logic.

This test properly validates that BMH removal proceeds during installation when the BMH has the spoke-created annotation (BMH_SPOKE_CREATED_ANNOTATION: "true"). The test correctly:

Sets up the agent with the BMH label linking to the BMH

Mocks the BMH with the annotation indicating it was created on the spoke cluster

Verifies the BMH is deleted during finalization

Uses consistent verification patterns (IsNotFound check)

The presence of the annotation indicates the BMH was created on the spoke cluster and should be cleaned up, even during mid-installation deletion.

The updateStatus function might exit early and not end up patching the agent's status since the patch was at the end. This ensures that it always patches it.

CrystalChun · 2025-12-24T16:45:51Z

/retest-required

gamli75 · 2025-12-24T18:29:42Z

/override ci/prow/okd-scos-images

openshift-ci · 2025-12-24T18:29:59Z

@gamli75: Overrode contexts on behalf of gamli75: ci/prow/okd-scos-images

Details

In response to this:

/override ci/prow/okd-scos-images

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci · 2025-12-24T18:30:01Z

@CrystalChun: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 22, 2025

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 22, 2025

openshift-ci bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 22, 2025

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 22, 2025

CrystalChun force-pushed the agent-deletion branch from 44ca0ae to 1554411 Compare December 22, 2025 18:29

CrystalChun marked this pull request as ready for review December 23, 2025 19:12

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 23, 2025

openshift-ci bot requested review from gamli75 and romfreiman December 23, 2025 19:12

coderabbitai bot reviewed Dec 23, 2025

View reviewed changes

internal/controller/controllers/agent_controller.go Outdated Show resolved Hide resolved

CrystalChun force-pushed the agent-deletion branch from baac52b to 2ba0d2b Compare December 23, 2025 20:32

coderabbitai bot reviewed Dec 23, 2025

View reviewed changes

CrystalChun force-pushed the agent-deletion branch from 2ba0d2b to fc1268f Compare December 23, 2025 20:52

coderabbitai bot reviewed Dec 23, 2025

View reviewed changes

CrystalChun force-pushed the agent-deletion branch 3 times, most recently from 2fa8d45 to 274af1a Compare December 24, 2025 00:47

MGMT-22278: Fix update status so it always patches the agent's status

7183461

The updateStatus function might exit early and not end up patching the agent's status since the patch was at the end. This ensures that it always patches it.

CrystalChun force-pushed the agent-deletion branch from 274af1a to 7183461 Compare December 24, 2025 01:17

MGMT-22278: Skip deleting spoke resources for an uninstalled agent #8608

Are you sure you want to change the base?

MGMT-22278: Skip deleting spoke resources for an uninstalled agent #8608

Uh oh!

Conversation

CrystalChun commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

List all the issues related to this PR

What environments does this code impact?

How was this code tested?

Manual Testing

Recreate customer scenario

Additional functionality testing

Regression testing

Checklist

Reviewers Checklist

Uh oh!

openshift-ci bot commented Dec 22, 2025

Uh oh!

openshift-ci-robot commented Dec 22, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

List all the issues related to this PR

What environments does this code impact?

How was this code tested?

Checklist

Reviewers Checklist

Uh oh!

coderabbitai bot commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

openshift-ci bot commented Dec 22, 2025

Uh oh!

CrystalChun commented Dec 22, 2025

Uh oh!

openshift-ci bot commented Dec 22, 2025

Uh oh!

CrystalChun commented Dec 22, 2025

Uh oh!

openshift-ci-robot commented Dec 22, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

List all the issues related to this PR

What environments does this code impact?

How was this code tested?

Manual Testing

Recreate customer scenario

Additional functionality testing

Regression testing

Checklist

Reviewers Checklist

Uh oh!

openshift-ci-robot commented Dec 22, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

List all the issues related to this PR

What environments does this code impact?

How was this code tested?

Manual Testing

Recreate customer scenario

Additional functionality testing

Regression testing

Checklist

Reviewers Checklist

Uh oh!

openshift-ci-robot commented Dec 22, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

List all the issues related to this PR

What environments does this code impact?

How was this code tested?

Manual Testing

Recreate customer scenario

Additional functionality testing

Regression testing

Checklist

Reviewers Checklist

Uh oh!

openshift-ci-robot commented Dec 23, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

List all the issues related to this PR

CrystalChun commented Dec 22, 2025 •

edited

Loading

openshift-ci-robot commented Dec 22, 2025 •

edited by openshift-ci bot

Loading

coderabbitai bot commented Dec 22, 2025 •

edited

Loading

openshift-ci-robot commented Dec 22, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Dec 22, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Dec 22, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Dec 23, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Dec 23, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Dec 23, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Dec 23, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Dec 23, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Dec 23, 2025 •

edited by openshift-ci bot

Loading