Skip to content

Comments

docs(administration): add node maintenance & cordoning guide#1030

Merged
LexLuthr merged 1 commit intomainfrom
docs/cordon-maintenance-guide
Feb 19, 2026
Merged

docs(administration): add node maintenance & cordoning guide#1030
LexLuthr merged 1 commit intomainfrom
docs/cordon-maintenance-guide

Conversation

@Reiers
Copy link
Contributor

@Reiers Reiers commented Feb 19, 2026

Context

Replaces the code approach in #1026 with operator documentation, per team consensus that cordon behavior should remain simple and user-controlled.

What's added

  • New page: administration/node-maintenance.md — covers what cordoning does, how it affects location-bound pipeline tasks, and recommended workflows for:
    • Quick maintenance (restart/upgrade)
    • Planned extended maintenance
    • Node decommissioning / long outage (attach storage to another node)
  • Cross-reference from troubleshooting/common-errors.md — so operators who find stuck sectors after cordoning get pointed to the guide.
  • SUMMARY.md + Administration README updated with the new page.

Why docs instead of code

The discussion on #1025 and #1026 settled on keeping cordon simple by design:

  • Cordon = planned maintenance, quick turnaround
  • Adding drain exceptions leads to scope creep (conditional drain, timeouts, per-task policies)
  • Operator architecture varies too much SP-to-SP for one-size-fits-all drain logic
  • The right fix is operator awareness, not automatic behavior

This guide gives operators the information they need to plan maintenance without losing pipeline work.

Closes #1025
Refs: #1026

Adds a dedicated guide covering cordon behavior, its effect on
location-bound pipeline tasks, and recommended workflows for short
maintenance, planned extended downtime, and node decommissioning.

Also adds a cross-reference from the troubleshooting common-errors page
for operators who find stuck sectors after cordoning.

Replaces the code-level approach in #1026 with operator documentation,
per team consensus that cordon should remain simple and user-controlled.

Refs: #1025, #1026
@LexLuthr LexLuthr merged commit 4918b43 into main Feb 19, 2026
22 of 24 checks passed
@LexLuthr LexLuthr deleted the docs/cordon-maintenance-guide branch February 19, 2026 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: cordon should allow in-progress pipelines to drain

2 participants