Skip to content

strategy.drainTimeout not working as intended? #346

@jess-belliveau

Description

@jess-belliveau

Is this a BUG REPORT or FEATURE REQUEST?:
BUG REPORT

What happened:
I am setting strategy.drainTimeout to 1000 seconds but I see the node immediately terminated after the node drain is issued.

What you expected to happen:
I expect upgrade-manager to wait 1000 seconds after the drain is issued before terminating the instance.

How to reproduce it (as minimally and precisely as possible):

➜ cat ru-drain.yml
apiVersion: upgrademgr.keikoproj.io/v1alpha1
kind: RollingUpgrade
metadata:
  annotations:
    app.kubernetes.io/managed-by: instance-manager
    instancemgr.keikoproj.io/upgrade-scope: <snip>-instance-manager-platform-apm-us-west-2a
  name: platform-apm-us-west-2a-20220715002858-19
  namespace: instance-manager
spec:
  asgName: <snip>-instance-manager-platform-apm-us-west-2a
  forceRefresh: true
  nodeIntervalSeconds: 10
  postDrain:
    waitSeconds: 300
  postDrainDelaySeconds: 45
  strategy:
    drainTimeout: 1000      <- this is the field I'm setting
    maxUnavailable: 1
    mode: eager

Anything else we need to know?:
Am I interpreting the spec correctly?

Environment:

  • rolling-upgrade-controller version: v1.0.6
  • Kubernetes version :
$ kubectl version -o yaml
serverVersion:
  buildDate: "2022-10-24T20:32:54Z"
  compiler: gc
  gitCommit: b07006b2e59857b13fe5057a956e86225f0e82b7
  gitTreeState: clean
  gitVersion: v1.21.14-eks-fb459a0
  goVersion: go1.16.15
  major: "1"
  minor: 21+
  platform: linux/amd64

Other debugging information (if applicable):

  • RollingUpgrade status:
➜ kd rollingupgrades platform-apm-us-west-2a-20220715002858-20 -n instance-manager
Name:         platform-apm-us-west-2a-20220715002858-20
Namespace:    instance-manager
Labels:       <none>
Annotations:  app.kubernetes.io/managed-by: instance-manager
              instancemgr.keikoproj.io/upgrade-scope: snip-instance-manager-platform-apm-us-west-2a
API Version:  upgrademgr.keikoproj.io/v1alpha1
Kind:         RollingUpgrade
Metadata:
  Creation Timestamp:  2022-11-18T05:41:11Z
  Generation:          1
  Managed Fields:
    API Version:  upgrademgr.keikoproj.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:app.kubernetes.io/managed-by:
          f:instancemgr.keikoproj.io/upgrade-scope:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:asgName:
        f:forceRefresh:
        f:nodeIntervalSeconds:
        f:postDrain:
          .:
          f:waitSeconds:
        f:postDrainDelaySeconds:
        f:strategy:
          .:
          f:drainTimeout:
          f:maxUnavailable:
          f:mode:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2022-11-18T05:41:11Z
    API Version:  upgrademgr.keikoproj.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:completePercentage:
        f:currentStatus:
        f:endTime:
        f:lastDrainTime:
        f:lastTerminationTime:
        f:nodesProcessed:
        f:startTime:
        f:statistics:
        f:totalNodes:
        f:totalProcessingTime:
    Manager:         manager
    Operation:       Update
    Time:            2022-11-18T05:43:18Z
  Resource Version:  228511895
  UID:               2eebdb9d-f8d8-4688-8985-7d713d9245f2
Spec:
  Asg Name:               snip-instance-manager-platform-apm-us-west-2a
  Force Refresh:          true
  Node Interval Seconds:  10
  Post Drain:
    Wait Seconds:            300
  Post Drain Delay Seconds:  45
  Strategy:
    Drain Timeout:    1000
    Max Unavailable:  1
    Mode:             eager
Status:
  Complete Percentage:    100%
  Current Status:         completed
  End Time:               2022-11-18T05:43:18Z
  Last Drain Time:        2022-11-18T05:43:16Z
  Last Termination Time:  2022-11-18T05:43:16Z
  Nodes Processed:        1
  Start Time:             2022-11-18T05:41:11Z
  Statistics:
    Duration Count:       1
    Duration Sum:         2.545233409s
    Step Name:            kickoff
    Duration Count:       1
    Duration Sum:         2m1.447352312s
    Step Name:            desired_node_ready
    Duration Count:       1
    Duration Sum:         41.598µs
    Step Name:            predrain_script
    Duration Count:       1
    Duration Sum:         180.544516ms
    Step Name:            drain
    Duration Count:       1
    Duration Sum:         6.235µs
    Step Name:            postdrain_script
    Duration Count:       1
    Duration Sum:         54.047µs
    Step Name:            post_wait
    Duration Count:       1
    Duration Sum:         225.887155ms
    Step Name:            terminate
    Duration Count:       1
    Duration Sum:         4.774µs
    Step Name:            post_terminate
    Duration Count:       1
    Duration Sum:         9.999999708s
    Step Name:            terminated
    Duration Count:       1
    Duration Sum:         2m13.853890401s
    Step Name:            total
  Total Nodes:            1
  Total Processing Time:  2m7s
Events:                   <none>
  • controller logs:
upgrade-manager-controller-manager-859c65b5db-gzfns manager 2022-11-18T05:43:14.771Z	INFO	controllers.RollingUpgrade	***Reconciling***
upgrade-manager-controller-manager-859c65b5db-gzfns manager 2022-11-18T05:43:14.771Z	INFO	controllers.RollingUpgrade	operating on existing rolling upgrade	{"scalingGroup": "snip-instance-manager-platform-apm-us-west-2a", "update strategy": {"type":"randomUpdate","mode":"eager","maxUnavailable":1,"drainTimeout":1000}, "name": "instance-manager/platform-apm-us-west-2a-20220715002858-20"}
upgrade-manager-controller-manager-859c65b5db-gzfns manager 2022-11-18T05:43:16.447Z	INFO	controllers.RollingUpgrade	scaling group details	{"scalingGroup": "snip-instance-manager-platform-apm-us-west-2a", "desiredInstances": 1, "launchConfig": "", "name": "instance-manager/platform-apm-us-west-2a-20220715002858-20"}
upgrade-manager-controller-manager-859c65b5db-gzfns manager 2022-11-18T05:43:16.447Z	INFO	controllers.RollingUpgrade	checking if rolling upgrade is completed	{"name": "instance-manager/platform-apm-us-west-2a-20220715002858-20"}
upgrade-manager-controller-manager-859c65b5db-gzfns manager 2022-11-18T05:43:16.447Z	INFO	controllers.RollingUpgrade	rolling upgrade configured for forced refresh	{"instance": "i-0bbb077b2dab36ac5", "name": "instance-manager/platform-apm-us-west-2a-20220715002858-20"}
upgrade-manager-controller-manager-859c65b5db-gzfns manager 2022-11-18T05:43:16.447Z	INFO	controllers.RollingUpgrade	drift detected in scaling group	{"driftedInstancesCount/DesiredInstancesCount": "(1/1)", "name": "instance-manager/platform-apm-us-west-2a-20220715002858-20"}
upgrade-manager-controller-manager-859c65b5db-gzfns manager 2022-11-18T05:43:16.447Z	INFO	controllers.RollingUpgrade	selecting batch for rotation	{"batch size": 1, "name": "instance-manager/platform-apm-us-west-2a-20220715002858-20"}
upgrade-manager-controller-manager-859c65b5db-gzfns manager 2022-11-18T05:43:16.447Z	INFO	controllers.RollingUpgrade	found in-progress instances	{"instances": ["i-0bbb077b2dab36ac5"]}
upgrade-manager-controller-manager-859c65b5db-gzfns manager 2022-11-18T05:43:16.447Z	INFO	controllers.RollingUpgrade	rolling upgrade configured for forced refresh	{"instance": "i-0bbb077b2dab36ac5", "name": "instance-manager/platform-apm-us-west-2a-20220715002858-20"}
upgrade-manager-controller-manager-859c65b5db-gzfns manager 2022-11-18T05:43:16.447Z	INFO	controllers.RollingUpgrade	rotating batch	{"instances": ["i-0bbb077b2dab36ac5"], "name": "instance-manager/platform-apm-us-west-2a-20220715002858-20"}
upgrade-manager-controller-manager-859c65b5db-gzfns manager 2022-11-18T05:43:16.447Z	INFO	controllers.RollingUpgrade	no InService instances in the batch	{"batch": ["i-0bbb077b2dab36ac5"], "instances(InService)": [], "name": "instance-manager/platform-apm-us-west-2a-20220715002858-20"}
upgrade-manager-controller-manager-859c65b5db-gzfns manager 2022-11-18T05:43:16.447Z	INFO	controllers.RollingUpgrade	waiting for desired nodes	{"name": "instance-manager/platform-apm-us-west-2a-20220715002858-20"}
upgrade-manager-controller-manager-859c65b5db-gzfns manager 2022-11-18T05:43:16.447Z	INFO	controllers.RollingUpgrade	desired nodes are ready	{"name": "instance-manager/platform-apm-us-west-2a-20220715002858-20"}
upgrade-manager-controller-manager-859c65b5db-gzfns manager 2022-11-18T05:43:16.447Z	INFO	controllers.RollingUpgrade	draining the node	{"instance": "i-0bbb077b2dab36ac5", "node name": "ip-172-29-72-153.us-west-2.compute.internal", "name": "instance-manager/platform-apm-us-west-2a-20220715002858-20"}
upgrade-manager-controller-manager-859c65b5db-gzfns manager WARNING: ignoring DaemonSet-managed Pods: kube-system/cilium-9chsq, kube-system/clamav-akp-ck4p6, kube-system/ebs-csi-node-28fxf, kube-system/kiam-agent-w5lxw, kube-system/kube-proxy-s5d2n, kube-system/node-local-dns-t4z8q, monitoring/node-exporter-4qsg9, ossec/ossec-akp-x55sv
upgrade-manager-controller-manager-859c65b5db-gzfns manager evicting pod nginx-ing-utility/nginx-ingress-utility-controller-65d6447d75-rpb6t
upgrade-manager-controller-manager-859c65b5db-gzfns manager evicting pod nginx-ing-grpc/nginx-ingress-grpc-controller-7dd4c7b9f-q2mc5
upgrade-manager-controller-manager-859c65b5db-gzfns manager evicting pod nginx-ing-public/nginx-ingress-public-controller-c664fcc7c-82p9x
upgrade-manager-controller-manager-859c65b5db-gzfns manager evicting pod nginx-ing-default/nginx-ingress-default-controller-69d64b6b5-n9n5d
upgrade-manager-controller-manager-859c65b5db-gzfns manager evicting pod nginx-ing-bff/nginx-ingress-bff-controller-588cc868fc-d4vt2
### should the 1000 second pause not happen here????
upgrade-manager-controller-manager-859c65b5db-gzfns manager 2022-11-18T05:43:16.627Z	INFO	controllers.RollingUpgrade	instances drained successfully, terminating	{"name": "instance-manager/platform-apm-us-west-2a-20220715002858-20"}
upgrade-manager-controller-manager-859c65b5db-gzfns manager 2022-11-18T05:43:16.628Z	INFO	controllers.RollingUpgrade	terminating instance	{"instance": "i-0bbb077b2dab36ac5", "name": "instance-manager/platform-apm-us-west-2a-20220715002858-20"}
upgrade-manager-controller-manager-859c65b5db-gzfns manager 2022-11-18T05:43:16.867Z	INFO	controllers.RollingUpgrade	***Reconciling***

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions