Skip to content

Cross-pool snapshot restore fallback when source pool has insufficient space #1987

@yugchaudhari

Description

@yugchaudhari

Is your feature request related to a problem? Please describe.

When creating a volume from a snapshot (CSI CreateVolume with snapshot source), Mayastor requires the new volume to be placed on the same pool as the snapshot's replica (because CoW snapshots are blobstore-local). If that pool is full, the operation fails with 507 Insufficient Storage, even when other pools in the cluster have terabytes of free space.

In our production KubeVirt environment, we use golden image snapshots to clone VM disks. The golden image's replicas were on a pool with 3.5 TiB used / 3.5 TiB capacity. Every VM provisioning attempt failed indefinitely, while other pools on the same and different nodes had 2-7 TiB free. The CSI driver kept retrying the same doomed restore for 38+ minutes (144 retry events) until manual intervention.

Describe the solution you'd like

Add fallback logic in the CreateVolume handler when restoring from a snapshot:

  1. Identify snapshot replica pools
  2. For each pool, check available capacity vs requested volume size
  3. If sufficient space → proceed with local CoW restore (existing fast path, instant)
  4. If insufficient space on ALL snapshot replica pools → fall back to "full copy restore":
    • Allocate the new volume on a different pool that matches the volume's topology constraints and has sufficient capacity
    • Copy data from the snapshot replica to the new volume's replica over the network
    • Complete the CreateVolume call successfully

This is architecturally feasible because Mayastor already has the building blocks:

  • Cross-node/cross-pool data copy: used during replica rebuilds (self-heal)
  • Network-based replication: replicas are routinely copied between pools on different nodes
  • HotSpareReconciler infrastructure: detects a missing replica and copies data from a healthy replica to a new one on a different pool

What's missing is the logic to use that same mechanism during snapshot restore. The CreateVolume from snapshot handler currently has a hardcoded assumption: "restore must happen on the same pool as the snapshot."

Describe alternatives you've considered

  1. Return a non-retriable error code: If cross-pool restore is too complex, return FailedPrecondition instead of ResourceExhausted so the external-provisioner stops retrying and the orchestrator (CDI) can take alternative action. Currently ResourceExhausted is treated as retriable, causing infinite retry loops.

  2. CDI-level fallback: I've filed a comment on kubevirt/containerized-data-importer#4068 requesting CDI to fall back to host-assisted copy when snapshot clone fails. This works but is slower (involves a copy pod) and doesn't leverage Mayastor's native replication.

  3. Operational workaround: Ensure source image pools always have free space. This is what we do today, but it's fragile and requires constant capacity monitoring.

Additional context

Approach Speed Space requirement
Same-pool CoW restore (current) Instant (metadata only) Same pool must have space
Cross-pool full copy (proposed fallback) Slower (network copy) Any pool with space works

The fallback is slower but succeeds, which is better than indefinite failure.

Environment:

  • Mayastor: v2.10.0
  • Kubernetes: v1.34.3
  • CDI: v1.64.0
  • Pool configuration: 14 pools across 6 nodes (3.5 TiB and 7 TiB pools)

Related: #1895 (snapshot rebuild when pool is offline, similar problem, different trigger)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions