Skip to content

Update CrashLoopBackOff condition as soon as timeout expires #790

Open
@himanshu-kun

Description

@himanshu-kun

How to categorize this issue?

/area robustness
/kind bug
/priority 2

What happened:

Currently MCM doesn't turn CrashLoopBackoff(CLBF) machines to Failed as soon as creationTimeout expires , but delays have been observed which can range to 2min to any time.

There are 2 parts of the problem:

  1. timeout check for CLBF machine is done after making CreateMachine() driver call . CreateMachine() itself could take any amount of time as it provisions VM on the cloudprovider (in Azure big delays like 30min or more have been seen before)
  2. even if the 1) doesn't exist to contribute to the delay, the retry/re-push to the queue can also introduce the delay of ShortRetry(3min). This happens because the retry period is not calculated considering the time left before timeout but is just a constant value of ShortRetry(3min)

What you expected to happen:
Turn to Failed as soon as timeout expires.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

We could take inspiration from machineDeployment logic which turns Progressing condition to False with reason ProgressDeadlineExceeded as soon as the deadline exceeds.

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • Others:

Metadata

Metadata

Assignees

Labels

area/robustnessRobustness, reliability, resilience relatedkind/bugBuglifecycle/rottenNobody worked on this for 12 months (final aging stage)needs/planningNeeds (more) planning with other MCM maintainerspriority/2Priority (lower number equals higher priority)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions