Skip to content

feat: Integrate with NPD and Descheduler for taint-based rescheduling #1

@googs1025

Description

@googs1025

Description:

Goal

We propose to build an remediation loop that Node Problem Detector NPD, a Node Readiness Controller , and Descheduler to enable self-healing workloads based on node-level health signals.

This would allow the system to:

  1. Detect node-level issues via NPD.
  2. Automatically taint unhealthy nodes.
  3. Trigger pod eviction/rescheduling via Descheduler for non-tolerating workloads.

Proposed Architecture

  1. Detection (NPD)

    • NPD runs custom health checks (e.g., some hardware status).
    • On failure, it sets a custom NodeCondition (e.g., CustomCondition/MyComponentReady=False).
  2. Tainting (Node Readiness Controller)

    • A NodeReadinessGateRule watches the custom condition.
    • When the condition is not True, it adds a specific taint (e.g., readiness.k8s.io/my-component-ready=false:NoSchedule) to the node.
    • When the condition recovers, the taint is automatically removed.
  3. Rescheduling (Descheduler)

    • Descheduler runs with the RemovePodsViolatingNodeTaints strategy.
    • It is configured with includedTaints: ["readiness.k8s.io/my-component-ready"] to only act on our custom taint.
    • Pods without a matching toleration are evicted and rescheduled by the default scheduler onto healthy nodes.
┌──────────────────────┐    ┌─────────────────────────────┐
│ Node Problem         │    │ Node Readiness              │
│ Detector (NPD)       │    │ Controller (NRC)            │
└──────────────────────┘    └─────────────────────────────┘
            │                              ▲
            │ Detects hardware/daemon      │ Watches NodeCondition
            │ failure & sets condition     │
            ▼                              │
┌───────────────────────────────────────────────────────────┐
│ Node Condition: CustomCondition/MyComponentReady=False    │
└───────────────────────────────────────────────────────────┘
                                            
                                            │ Triggers taint logic
                                            
┌───────────────────────────────────────────────────────────┐
│ Node Taint: readiness.k8s.io/my-component-ready=false:NoSchedule │
└───────────────────────────────────────────────────────────┘
            
            │ Node now unschedulable for non-tolerant pods
            
┌──────────────────────┐    ┌─────────────────────────────┐
│ Pods on this Node    │    │ Descheduler                 │
│ (without toleration) │◄───┤ • Strategy:                 
└──────────────────────┘    │   RemovePodsViolatingNodeTaints │
                            │ • includedTaints:           
                            │   ["readiness.k8s.io/my-component-ready"] │
                            └─────────────────────────────┘
                                            
                                            │ Evicts violating pods
                                            
                             ┌─────────────────────────────┐
                             │ Kubernetes Scheduler        │
                             │ • Re-schedules evicted pods │
                             └─────────────────────────────┘
                                            
                                            
                             ┌─────────────────────────────┐
                             │ Healthy Nodes               │
                             │ (no matching taint)         │
                             └─────────────────────────────┘

Benefits

  • Automated recovery: No manual intervention needed for common node-level failures.
  • Kubernetes-native: Built entirely on standard APIs (Conditions, Taints/Tolerations).
  • Modular & extensible: New health checks can be added by defining new NPD rules + NRC gates.

Request

We’d like to:

  • Confirm this integration pattern aligns with the project’s direction.
  • Discuss whether support for such workflows should be documented or facilitated (e.g., example configs, Helm values).
  • Explore if any enhancements are needed in the current NodeReadinessGateRule CRD or controller logic to better support this use case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions