Skip to content

Cloud Controller Manager fails to discover new VMs after upgrading to Nutanix Cloud Provider v0.6.0 #229

@g0k8s

Description

@g0k8s

/kind bug

What steps did you take and what happened:

After upgrading to Nutanix Cloud Provider v0.6.0, the cloud-controller-manager no longer discovers newly added VMs. This results in nodes either being continuously recreated or remaining tainted and unschedulable.

Issue 1: VM repeatedly recreated / VM not found

After provisioning a VM and joining it to the Kubernetes cluster, the node keeps getting recreated. The following logs are observed in the nutanix-cloud-controller-manager pod:

I1218 15:26:11.890746       1 node_controller.go:429] Initializing node devk8s01-md-worker-default-jz9kk-khkw5 with cloud provider
2025-12-18 15:26:11.891Z INFO - GET https://10.10.201.20:9440/api/vmm/v4.1/ahv/config/vms/3fbf7e7f-509d-4612-b58a-c28eb2f5fc39
2025-12-18 15:26:11.906Z INFO - HTTP/1.1 404 NOT FOUND
I1218 15:26:11.906873       1 node_controller.go:233] error syncing 'devk8s01-md-worker-default-jz9kk-khkw5': failed to get instance metadata for node devk8s01-md-worker-default-jz9kk-khkw5: failed to get VM: API call failed: {"data":{"error":[{"message":"Failed to perform the operation on the VM with UUID '3fbf7e7f-509d-4612-b58a-c28eb2f5fc39', because it is not found.","severity":"ERROR","code":"VMM-30100","locale":"en-US","errorGroup":"VM_NOT_FOUND","argumentsMap":{"vm_uuid":"3fbf7e7f-509d-4612-b58a-c28eb2f5fc39"},"$objectType":"vmm.v4.error.AppMessage"}],"$errorItemDiscriminator":"List<vmm.v4.error.AppMessage>","$objectType":"vmm.v4.error.ErrorResponse"},"$dataItemDiscriminator":"vmm.v4.error.ErrorResponse"}, requeuing
E1218 15:26:11.906912       1 node_controller.go:244] "Unhandled Error" err="error syncing 'devk8s01-md-worker-default-jz9kk-khkw5': failed to get instance metadata for node devk8s01-md-worker-default-jz9kk-khkw5: failed to get VM: API call failed: {...}, requeuing" logger="UnhandledError"

The VM exists and is visible in Prism, but the CCM fails to retrieve it via the API and continuously retries.

Issue 2: VM discovered but node remains tainted

On another cluster, the VM is successfully found, but the node never becomes initialized and remains tainted:

2025-12-23 15:09:46.712Z INFO - GET https://10.10.201.20:9440/api/vmm/v4.1/ahv/config/vms/74d90883-9a67-461c-5a75-f8070d9a6470
2025-12-23 15:09:46.736Z INFO - HTTP/1.1 200 OK
I1223 15:09:46.738543       1 node_controller.go:233] error syncing 'pock8s01-md-worker-default-kxh9x-wl2wt': failed to get instance metadata for node pock8s01-md-worker-default-kxh9x-wl2wt: unable to determine network interfaces from VM with UUID 74d90883-9a67-461c-5a75-f8070d9a6470, requeuing

The node remains tainted indefinitely:

Taints:
  node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
  node.cluster.x-k8s.io/uninitialized:NoSchedule
Unschedulable: false

What did you expect to happen:

New nodes are added to the cluster and the "unschedulable" taints are removed.

Anything else you would like to add:

VMs are created from a machine template hosted on prism central.
Workaround: Downgrading to a previous Nutanix Cloud Provider version restores normal behavior.

Environment:

  • prism central version: 7.3
  • cloud-provider-nutanix version: v0.6.0
  • Kubernetes version: (use kubectl version): v1.33.5
  • OS (e.g. from /etc/os-release): Ubuntu 22.04

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions