Open
Description
Describe the bug
I recently used kubergrunt eks deploy
to roll-out workers update and it suddenly failed at some point (see logs):
[] INFO[2023-02-10T12:57:02+01:00] Successfully launched new nodes with new launch config on ASG app-workers-eks-data-asg-20210305120833070500000007 name=kubergrunt
[] INFO[2023-02-10T12:57:02+01:00] Waiting for 3 nodes in Kubernetes to reach ready state name=kubergrunt
[] INFO[2023-02-10T12:57:02+01:00] Loading Kubernetes Client name=kubergrunt
[] INFO[2023-02-10T12:57:02+01:00] Using config on disk and context. name=kubergrunt
[] INFO[2023-02-10T12:57:02+01:00] Checking if nodes ready name=kubergrunt
[] ERRO[2023-02-10T12:57:04+01:00] Timed out waiting for the instances to reach ready state in Kubernetes. name=kubergrunt
[] ERRO[2023-02-10T12:57:04+01:00] Undo by terminating all the new instances and trying again name=kubergrunt
[] ERRO[2023-02-10T12:57:04+01:00] Error while waiting for new nodes to be ready. name=kubergrunt
[] ERRO[2023-02-10T12:57:04+01:00] Either resume with the recovery file or terminate the new instances. name=kubergrunt
The reason for this that my kubectl
was not switched to correct context, basically, neither kubectl
not kubergrunt
cannot authenticate to the correct cluster. But, as you can see from logs, you cannot define that.
To Reproduce
Steps to reproduce the behavior including the relevant Terraform/Terragrunt/Packer version number and any code snippets and module inputs you used.
Expected behavior
Display meaningful errors if kubergrunt
cannot authenticate to Kubernetes cluster.
Nice to have
- Terminal output
- Screenshots
Additional context
Add any other context about the problem here.