Skip to content

RollingUpgrade not consistent  #309

@preflightsiren

Description

@preflightsiren

Is this a BUG REPORT or FEATURE REQUEST?: Bug

What happened: A RollingUpgrade created after a modification to launch templates via instance-manager did not detect any nodes that need to be recycled. To resolve the RollingUpgrade is deleted, and instance-manager is restarted, recreating the RollingUpgrade

What you expected to happen: Node is detected as being out-of-sync and replaced.

How to reproduce it (as minimally and precisely as possible): I can't consistently recreate this, but this happens often during our monthly patching cycle.

Anything else we need to know?:

Environment:

  • rolling-upgrade-controller version: 1.0.2
  • Kubernetes version : 1.19.7 | 1.20.10
$ kubectl version -o yaml

Other debugging information (if applicable):

  • RollingUpgrade status:
$ kubectl describe rollingupgrade <rollingupgrade-name>

The rollingupgrade has already been replaced. I did see that the state was "completed".

  • controller logs:
$ kubectl logs <rolling-upgrade-controller pod>
2021-10-12T00:16:48.561Z	INFO	controllers.RollingUpgrade	admitted   new rolling upgrade	{"scalingGroup":   "uw2d-akp-b1-instance-manager-default-sh-m5-2xlarge-us-west-2b",   "update strategy": {"type":"randomUpdate","mode":"eager","maxUnavailable":1,"drainTimeout":2147483647},   "name":   "instance-manager/default-sh-m5-2xlarge-us-west-2b-20210908055655-6"}
--
2021-10-12T00:16:55.636Z	INFO	controllers.RollingUpgrade	scaling   group details	{"scalingGroup":   "uw2d-akp-b1-instance-manager-default-sh-m5-2xlarge-us-west-2b",   "desiredInstances": 1, "launchConfig": "", "name":   "instance-manager/default-sh-m5-2xlarge-us-west-2b-20210908055655-6"}
2021-10-12T00:16:55.736Z	INFO	controllers.RollingUpgrade	checking   if rolling upgrade is completed	{"name":   "instance-manager/default-sh-m5-2xlarge-us-west-2b-20210908055655-6"}
2021-10-12T00:16:55.736Z	INFO	controllers.RollingUpgrade	no   drift in scaling group	{"name":   "instance-manager/default-sh-m5-2xlarge-us-west-2b-20210908055655-6"}
2021-10-12T00:16:55.836Z	INFO	controllers.RollingUpgrade	rolling   upgrade ended	{"name":   "instance-manager/default-sh-m5-2xlarge-us-west-2b-20210908055655-6",   "status": "completed"}
2021-10-12T00:17:25.837Z	INFO	controllers.RollingUpgrade	rolling   upgrade ended	{"name":   "instance-manager/default-sh-m5-2xlarge-us-west-2b-20210908055655-6",   "status": "completed"}
2021-10-12T00:17:47.935Z	INFO	controllers.RollingUpgrade	rolling   upgrade ended	{"name":   "instance-manager/default-sh-m5-2xlarge-us-west-2b-20210908055655-6",   "status": "completed"}

***** RollingUpgrade is delete and instance-manager restarted *****

2021-10-12T00:20:59.590Z	INFO	controllers.RollingUpgrade	rolling   upgrade resource not found, deleted object from admission   map	{"name":   "instance-manager/default-sh-m5-2xlarge-us-west-2b-20210908055655-6"}
2021-10-12T00:21:59.133Z	INFO	controllers.RollingUpgrade	admitted   new rolling upgrade	{"scalingGroup":   "uw2d-akp-b1-instance-manager-default-sh-m5-2xlarge-us-west-2b",   "update strategy": {"type":"randomUpdate","mode":"eager","maxUnavailable":1,"drainTimeout":2147483647},   "name":   "instance-manager/default-sh-m5-2xlarge-us-west-2b-20210908055655-6"}
2021-10-12T00:22:01.833Z	INFO	controllers.RollingUpgrade	scaling   group details	{"scalingGroup":   "uw2d-akp-b1-instance-manager-default-sh-m5-2xlarge-us-west-2b",   "desiredInstances": 1, "launchConfig": "", "name":   "instance-manager/default-sh-m5-2xlarge-us-west-2b-20210908055655-6"}
2021-10-12T00:22:01.833Z	INFO	controllers.RollingUpgrade	checking   if rolling upgrade is completed	{"name":   "instance-manager/default-sh-m5-2xlarge-us-west-2b-20210908055655-6"}
2021-10-12T00:22:01.833Z	INFO	controllers.RollingUpgrade	drift   detected in scaling   group	{"driftedInstancesCount/DesiredInstancesCount":   "(1/1)", "name":   "instance-manager/default-sh-m5-2xlarge-us-west-2b-20210908055655-6"}
2021-10-12T00:22:01.833Z	INFO	controllers.RollingUpgrade	selecting   batch for rotation	{"batch size": 1, "name":   "instance-manager/default-sh-m5-2xlarge-us-west-2b-20210908055655-6"}
2021-10-12T00:22:01.833Z	INFO	controllers.RollingUpgrade	rotating   batch	{"instances": ["i-0017fd066bbfd0e32"],   "name":   "instance-manager/default-sh-m5-2xlarge-us-west-2b-20210908055655-6"}
2021-10-12T00:22:01.833Z	INFO	controllers.RollingUpgrade	setting   instances to in-progress	{"batch":   ["i-0017fd066bbfd0e32"], "instances(InService)":   ["i-0017fd066bbfd0e32"], "name": "instance-manager/default-sh-m5-2xlarge-us-west-2b-20210908055655-6"}
2021-10-12T00:22:02.032Z	INFO	controllers.RollingUpgrade	setting   instances to stand-by	{"batch": ["i-0017fd066bbfd0e32"],   "instances(InService)": ["i-0017fd066bbfd0e32"],   "name": "instance-manager/default-sh-m5-2xlarge-us-west-2b-20210908055655-6"}
2021-10-12T00:22:02.433Z	INFO	controllers.RollingUpgrade	operating   on existing rolling upgrade	{"scalingGroup":   "uw2d-akp-b1-instance-manager-default-sh-m5-2xlarge-us-west-2b",   "update strategy": {"type":"randomUpdate","mode":"eager","maxUnavailable":1,"drainTimeout":2147483647},   "name":   "instance-manager/default-sh-m5-2xlarge-us-west-2b-20210908055655-6"}
2021-10-12T00:22:05.033Z	INFO	controllers.RollingUpgrade	scaling   group details	{"scalingGroup":   "uw2d-akp-b1-instance-manager-default-sh-m5-2xlarge-us-west-2b",   "desiredInstances": 1, "launchConfig": "", "name":   "instance-manager/default-sh-m5-2xlarge-us-west-2b-20210908055655-6"}
2021-10-12T00:22:05.033Z	INFO	controllers.RollingUpgrade	checking   if rolling upgrade is completed	{"name":   "instance-manager/default-sh-m5-2xlarge-us-west-2b-20210908055655-6"}
2021-10-12T00:22:05.033Z	INFO	controllers.RollingUpgrade	drift   detected in scaling   group	{"driftedInstancesCount/DesiredInstancesCount":   "(1/1)", "name":   "instance-manager/default-sh-m5-2xlarge-us-west-2b-20210908055655-6"}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions