Unexpected Kubernetes Rollout Restart Deployment Behavior #12410
-
|
I've floated the information below through a few other communities a few months back with no feedback. More troubleshooting went down this week and one of the steps taken was to uninstall ArgoCD from a cluster and then attempt a rollout restart of all deployments. That resulted in a 100% successful restart. So I'm now pasting my earlier troubleshooting here, but with the added scenario where my problem seems to go away with Argo. I'm after evidence of what we may have misconfigured or if there's an app bug in play. Also I believe it's worth pointing out that we have two new deployments that seem to be added to the misbehaving list of 4 below. This makes me think we're somehow influencing this, but I haven't been able to figure out where/how when comparing working apps to misbehaving apps. Now on to my historical notes... I have 4 deployments that are not restarting properly out of ~25 that are developed internally. They all live in the same namespace in EKS and are deployed by Helm via ArgoCD. These apps are also deployed to 15+ AWS accounts and the behavior is consistent in each cluster. When attempting a kubectl rollout restart all but the four deployments restart as you’d expect. We have default settings in play so 25% of each deployments’ pods go down at a time and new ones spin up. The other four however behave differently. Their behavior is like so:
I’ve investigated leads where too many replicasets exist which in some past meant things got confused, but we use the default of 10 and none are over the limit. Killing all the replicasets to start back at one did not influence the behavior.
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
|
I recently stumbled across https://stackoverflow.com/questions/59050709/how-to-rollout-restart-deployment-through-the-api which caused me to look at our Helm configuration. Figured out the deployments having the issue all referenced a dependent chart for annotations. That dependent chart had no annotations listed which resulted in the manifest having a value of 'null'. Null in Helm means remove. So our configuration was set to remove annotations which impacted kubectl.kubernetes.io/restartedAt from being applied and resulted in our weird behavior. |
Beta Was this translation helpful? Give feedback.
I recently stumbled across https://stackoverflow.com/questions/59050709/how-to-rollout-restart-deployment-through-the-api which caused me to look at our Helm configuration. Figured out the deployments having the issue all referenced a dependent chart for annotations. That dependent chart had no annotations listed which resulted in the manifest having a value of 'null'. Null in Helm means remove.
So our configuration was set to remove annotations which impacted kubectl.kubernetes.io/restartedAt from being applied and resulted in our weird behavior.