Skip to content

Deploying worker step only fails because the cluster no longer exists #246

@SalDaniele

Description

@SalDaniele

Description of bug:

CDA is in middle of worker deployment, cluster is currently up:

(ocp-venv) [root@silpixa00400458 cluster-deployment-automation]# aicli -U 0.0.0.0:8080 get cluster
+------------------+--------------------------------------+--------------+------------+
|     Cluster      |                  Id                  |    Status    | Dns Domain |
+------------------+--------------------------------------+--------------+------------+
| baremetalcluster | 6cb16415-aa77-4801-b98e-6c4e0e0eefee | adding-hosts | redhat.com |
+------------------+--------------------------------------+--------------+------------+

Worker step fails / is stopped early, and then restarted, i.e.

python cda.py cluster.yaml deploy -s workers

This will now fail because the original cluster is destroyed

This was not the case in the past, we were able to run just the worker step without redeploying the control plane.

(ocp-venv) [root@silpixa00400458 cluster-deployment-automation]# python cda.py cluster.yaml deploy -s workers
/data/cluster-deployment-automation/ocp-venv/lib64/python3.11/site-packages/paramiko/pkey.py:82: CryptographyDeprecationWarning: TripleDES has been moved to cryptography.hazmat.decrepit.ciphers.algorithms.TripleDES and will be removed from this module in 48.0.0.
  "cipher": algorithms.TripleDES,
/data/cluster-deployment-automation/ocp-venv/lib64/python3.11/site-packages/paramiko/transport.py:253: CryptographyDeprecationWarning: TripleDES has been moved to cryptography.hazmat.decrepit.ciphers.algorithms.TripleDES and will be removed from this module in 48.0.0.
  "class": algorithms.TripleDES,
2024-09-06 19:35:59 INFO [th:140463459731264] (clustersConfig.py:253): range = ('192.168.122.1', '192.168.122.168')
2024-09-06 19:35:59 INFO [th:140463459731264] (libvirt.py:22): Configuring Libvirt modules
2024-09-06 19:36:13 INFO [th:140463459731264] (assistedInstallerService.py:338): assisted-installer already running with a different configmap
2024-09-06 19:36:13 INFO [th:140463459731264] (assistedInstallerService.py:394): Tearing down assisted-installer.
2024-09-06 19:36:15 INFO [th:140463459731264] (assistedInstallerService.py:346): Starting assisted-installer.
2024-09-06 19:37:05 INFO [th:140463459731264] (assistedInstallerService.py:377): Waiting for API to be ready at http://192.168.122.1:8090/api/assisted-install/v2/clusters...
2024-09-06 19:37:15 INFO [th:140463459731264] (clusterHost.py:75): Using enp65s0f0 as network API port
2024-09-06 19:37:15 INFO [th:140463459731264] (clusterDeployer.py:209): Skipping pre configuration.
2024-09-06 19:37:15 INFO [th:140463459731264] (clusterDeployer.py:164): Tearing down (some) workers on baremetalcluster
2024-09-06 19:37:15 INFO [th:140463459731264] (virtualBridge.py:100): Cleaning up /var/lib/libvirt/dnsmasq/virbr0.status
2024-09-06 19:37:15 INFO [th:140463459731264] (virtualBridge.py:101): removing hosts with mac in [] or name in []
2024-09-06 19:37:15 INFO [th:140463459731264] (virtualBridge.py:69): Kept entry {'ip-address': '192.168.122.237', 'mac-address': 'a4:bf:01:51:37:da', 'hostname': 'silpixa00400458-oob', 'client-id': '01:a4:bf:01:51:37:da', 'expiry-time': 1725649817}
2024-09-06 19:37:15 INFO [th:140463459731264] (virtualBridge.py:69): Kept entry {'ip-address': '192.168.122.100', 'mac-address': 'c4:cb:e1:a1:57:d3', 'hostname': 'silpixa00401707', 'client-id': '01:c4:cb:e1:a1:57:d3', 'expiry-time': 1725649892}
2024-09-06 19:37:15 INFO [th:140463459731264] (virtualBridge.py:69): Kept entry {'ip-address': '192.168.122.3', 'mac-address': '52:54:a6:82:45:aa', 'hostname': 'baremetalcluster-master-2', 'client-id': '01:52:54:a6:82:45:aa', 'expiry-time': 1725649974}
2024-09-06 19:37:15 INFO [th:140463459731264] (virtualBridge.py:69): Kept entry {'ip-address': '192.168.122.167', 'mac-address': 'c4:cb:e1:a1:50:17', 'hostname': 'silpixa00401709', 'client-id': '01:c4:cb:e1:a1:50:17', 'expiry-time': 1725649980}
2024-09-06 19:37:15 INFO [th:140463459731264] (virtualBridge.py:69): Kept entry {'ip-address': '192.168.122.2', 'mac-address': '52:54:00:ea:c7:e1', 'hostname': 'baremetalcluster-master-1', 'client-id': '01:52:54:00:ea:c7:e1', 'expiry-time': 1725651009}
2024-09-06 19:37:15 INFO [th:140463459731264] (virtualBridge.py:69): Kept entry {'ip-address': '192.168.122.4', 'mac-address': '52:54:16:40:81:9c', 'hostname': 'baremetalcluster-master-3', 'client-id': '01:52:54:16:40:81:9c', 'expiry-time': 1725651174}
2024-09-06 19:37:15 INFO [th:140463459731264] (virtualBridge.py:104): Delete "default" Libvirt network: (returncode: 0, error: )
2024-09-06 19:37:16 INFO [th:140463459731264] (virtualBridge.py:109): Start "default" Libvirt network: (returncode: 0, error: )
2024-09-06 19:37:18 INFO [th:140463459731264] (clusterHost.py:135): Block all DHCP replies on enp65s0f0 except the ones coming from the DHCP bridge
2024-09-06 19:37:18 INFO [th:140463459731264] (virtualBridge.py:238): Will try 3 to get the virbr0 ethernet address on localhost
2024-09-06 19:37:18 INFO [th:140463459731264] (clusterHost.py:141): Link enp65s0f0 to virbr0
2024-09-06 19:37:18 INFO [th:140463459731264] (clusterHost.py:145): No master set for interface enp65s0f0, setting it to virbr0
2024-09-06 19:37:18 INFO [th:140463459731264] (clusterHost.py:150): Setting interface enp65s0f0 as unmanaged in NetworkManager
2024-09-06 19:37:18 INFO [th:140463459731264] (clusterDeployer.py:190): Deleting worker worker-2
2024-09-06 19:37:18 INFO [th:140463459731264] (k8sClient.py:48): Deleting node worker-2
No Matching Host with name worker-2 found
2024-09-06 19:37:19 INFO [th:140463459731264] (clusterDeployer.py:219): Skipping master creation.
2024-09-06 19:37:19 INFO [th:140463459731264] (clusterDeployer.py:391): Setting up workers
Cluster baremetalcluster not found
(ocp-venv) [root@silpixa00400458 cluster-deployment-automation]# aicli -U 0.0.0.0:8080 get cluster
+---------+----+--------+------------+
| Cluster | Id | Status | Dns Domain |
+---------+----+--------+------------+
+---------+----+--------+------------+

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions