Skip to content

Health-check Mechanisms for MachinePools #929

@hardikdr

Description

@hardikdr

Summary

This is to propose the Kubernetes-node style health-check mechanisms for the MachinePools. It shall circumvent the issues related to degraded health of the previously Ready MachinePools.

The possible solution to such issues could involve:

  1. Heartbeats from the MachinePoollet to APIServer/MachinePool.
  2. Machinepool-controller, declaring MachinePool to be Unknown, NotReady based on pre-determined configurations, when beats are missing.

This is similar to how Kubelet updates the Ready Node.Status.NodeCondtion[] regularly, missing of which leads Node-controller to declare Nodes to be Unknown/NodeReady.

The possible consumer for this could be the Scheduler, which can prevent further workload from being scheduled on the affected Machinepool, while also eviction-controllers being able to evict workloads if needed.

Basic example

    - lastHeartbeatTime: "2023-12-05T10:58:27Z"
      lastTransitionTime: "2023-11-13T13:22:23Z"
      message: MachinePoollet is posting ready status. 
      reason: MachinePoolReady
      status: "True"
      type: Ready

Motivation

To enhance the means the disaster recovery.

Note

Considering this is a bigger epic, it's highly recommended to prepare an Enhancement proposal first.
This can also have possible touch-points with the Node-problem-detector like design with MachinePool, which is better discussed separately.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/iaasIssues related to IronCore IaaS development.enhancementNew feature or request

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions