Skip to content

feat: cordon should allow in-progress pipelines to drain #1025

@Reiers

Description

@Reiers

Problem

When a node is cordoned (unschedulable=true), the Harmony scheduler stops picking up all new tasks, including the next stage for sectors already mid-pipeline on that machine.

Example: a sector finishes SDR on a cordoned node → TreeD won't be scheduled, even though the cache files are on this machine and no other node can do the work. The pipeline is stuck.

The only task with a workaround is Finalize, which has SchedulingOverrides for batch tasks.

Expected Behavior

Cordon should mean "stop accepting new work, but finish what you started." Specifically:

  • New pipelines (new SDR tasks, new CC sectors via IAmBored) → blocked ✅ (works today)
  • Running tasks → allowed to complete ✅ (works today)
  • Next stage for in-progress pipelines on this node → should be allowed ❌ (broken today)

Current Workaround

Disable early-stage tasks individually (EnableSealSDR = false, etc.) while keeping later stages enabled. This allows in-progress pipelines to drain but is manual and error-prone.

Root Cause

In pollerTryAllWork(), when schedulable=false:

  • All handlers without SchedulingOverrides are skipped
  • followWorkInDB() is also skipped (the continue on line ~299)
  • The SealPoller still creates tasks in DB, but the Harmony poller won't claim them

Possible Approach

Extend the SchedulingOverrides pattern: when cordoned, allow scheduling a task if the sector's data already resides on this machine (i.e., an earlier pipeline stage was completed here). This could be implemented by:

  1. Checking if any related pipeline task was previously completed by this node (via harmony_task_history)
  2. Or checking if the sector's storage paths are on this machine
  3. Or adding a pipeline-aware "drain mode" flag that's distinct from full cordon

Related: pipeline-aware scheduling / anti-starvation improvements.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions