Skip to content

auto node remediation fixes#514

Open
ci-penbot-01 wants to merge 2 commits intoROCm:mainfrom
ci-penbot-01:CP.O2O.pensando.gpu-operator.1303.rocm.gpu-operator.main
Open

auto node remediation fixes#514
ci-penbot-01 wants to merge 2 commits intoROCm:mainfrom
ci-penbot-01:CP.O2O.pensando.gpu-operator.1303.rocm.gpu-operator.main

Conversation

@ci-penbot-01
Copy link
Copy Markdown
Contributor

cp of pensando/gpu-operator#1303


Source PR Description (pensando/gpu-operator#1303):

  • GPUOP-624 - Add missing RBAC permissions for the controller-manager to manage Deployments.
  • GPUOP-625 - Set owner references on the default WorkflowTemplate created by the operator, ensuring it is automatically cleaned up when the parent DeviceConfig is deleted.
  • GPUOP-626 - Fix incorrect ConfigMap example in the auto-remediation documentation.

Cherrypick triggered by: ACP-Automation

* update controller-manager serviceaccount rbac

* set deviceconfig as owner for the default workflow template

* update correct config map structure in docs

(cherry picked from commit d38ee154348a309e9eb455287c73071a00383827)
@ci-penbot-01
Copy link
Copy Markdown
Contributor Author

AI-Assisted Cherry-Pick

Source PR: #1303
Target Branch: main

The cherry-pick operation encountered merge conflicts which were resolved automatically using AI assistance.

Files with conflicts (resolved by AI):

  • bundle/manifests/amd-gpu-operator.clusterserviceversion.yaml:38-44
  • helm-charts-k8s/Chart.lock:12-16
Original conflict in bundle/manifests/amd-gpu-operator.clusterserviceversion.yaml
<<<<<<< HEAD
    containerImage: docker.io/rocm/amd-gpu-operator:dev
    createdAt: "2026-04-02T12:26:30Z"
=======
    containerImage: registry.test.pensando.io:5000/amd-gpu-operator:dev
    createdAt: "2026-04-07T12:28:11Z"
>>>>>>> d38ee154... Fixes for ANR jiras (#1303)
Original conflict in helm-charts-k8s/Chart.lock
<<<<<<< HEAD
generated: "2026-04-02T12:26:25.920315689Z"
=======
generated: "2026-04-07T12:28:07.188885215Z"
>>>>>>> d38ee154... Fixes for ANR jiras (#1303)

Cherry-pick triggered by: ACP-Automation

@biluriuday biluriuday changed the title [CP 1303] Fixes for ANR jiras auto node remediation fixes Apr 8, 2026
Copy link
Copy Markdown
Member

@yansun1996 yansun1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants