-
Notifications
You must be signed in to change notification settings - Fork 128
Description
We recently had a problem in the CI where the OS we use for the Nodes had an update that made one of our preKubeadmCommands fail. This is in turn caused some network issues, which was what we detected first. It took quite some time before we managed to figure out the root cause because no one suspected that there was an error in the cloud-init commands. The Machines were provisioned, the cluster worked as expected in many ways, all Nodes healthy.
My suggestion is that we add a step to the integration tests (or even to the controller if possible) to detect errors in cloud-init and report them in a more obvious way. In the CI we should simply be able to check cloud-init status and error out if it is set to status: error.
To be clear, this check should be done on the workload cluster Nodes.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status