feat(core): offline rebuild for unpublished degraded volumes by yugchaudhari · Pull Request #1108 · openebs/mayastor-control-plane

yugchaudhari · 2026-05-28T09:50:26Z

Iteration 2 of offline volume rebuild (openebs/openebs#4208), builds on the detection-only reconciler from #1103.

When an unpublished volume goes degraded, the reconciler now creates a temporary unshared nexus (target_config with protocol: None) after a grace period. That makes the volume look published to the existing HotSpareReconciler, which rebuilds the faulted replicas. Once the volume is Online again, the reconciler tears the nexus down and the volume returns to unpublished.

Key point: no rebuild logic is reimplemented, we lean on HotSpare. We just stand up and tear down the nexus.

New config: --offline-rebuild-grace-period (default 10m), behind the existing --offline-rebuild-enabled flag.

BDD tests cover happy path, feature-disabled, and never-published precondition.

Create a temporary unshared nexus for degraded unpublished volumes so the existing HotSpareReconciler can rebuild faulted replicas. Tear down the nexus once the volume returns to Online. Add configurable grace period (--offline-rebuild-grace-period, default 10m) and BDD tests covering happy path, feature-disabled, and never-published precondition. Signed-off-by: yugchaudhari <[email protected]>

yugchaudhari · 2026-05-28T16:47:31Z

Tested this on a 3-node cluster, both the BDD suite and a manual run.

BDD tests pass (cargo test -p agents --test core offline_rebuild): happy path, feature-disabled, and the never-published precondition.

For the manual run I created a 2-replica volume, published then unpublished it (to establish the health info), and killed one of the replica's io-engine nodes. The volume state over time:

Degraded  target=None                ← node killed, volume degraded
Degraded  target=io-engine-3/none    ← offline rebuild created the temp unshared nexus (protocol=none)
Online    target=None                ← rebuild done, nexus torn down, back to unpublished
Online    target=None                ← stays healed

And the reconciler's own logs line up with that:

16:39:39  DEBUG  Offline rebuild waiting for grace period, remaining: 7.97s
16:39:46  DEBUG  Offline rebuild waiting for grace period, remaining: 956ms
16:39:47  INFO   Initiating offline rebuild: creating non-shared nexus
16:39:47  INFO   Offline rebuild nexus created; HotSpareReconciler will handle the rebuild
16:39:50  INFO   Offline rebuild complete; tearing down temporary nexus
16:39:50  INFO   Temporary nexus destroyed; volume returned to unpublished state

So the full lifecycle works end to end: degraded unpublished volume → temp nexus → HotSpare rebuild → teardown → back to Online/unpublished.

yugchaudhari requested a review from a team as a code owner May 28, 2026 09:50

yugchaudhari mentioned this pull request May 28, 2026

[OEP 4208]: Offline Volume Rebuild openebs/openebs#4208

Open

7 tasks

yugchaudhari force-pushed the feat/offline-rebuild-iter2 branch from a4da3e4 to d8cc119 Compare May 28, 2026 16:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): offline rebuild for unpublished degraded volumes#1108

feat(core): offline rebuild for unpublished degraded volumes#1108
yugchaudhari wants to merge 1 commit into
openebs:developfrom
yugchaudhari:feat/offline-rebuild-iter2

yugchaudhari commented May 28, 2026

Uh oh!

yugchaudhari commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yugchaudhari commented May 28, 2026

Uh oh!

yugchaudhari commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant