Is your feature request related to a problem? Please describe.
When creating a volume from a snapshot (CSI CreateVolume with snapshot source), Mayastor requires the new volume to be placed on the same pool as the snapshot's replica (because CoW snapshots are blobstore-local). If that pool is full, the operation fails with 507 Insufficient Storage, even when other pools in the cluster have terabytes of free space.
In our production KubeVirt environment, we use golden image snapshots to clone VM disks. The golden image's replicas were on a pool with 3.5 TiB used / 3.5 TiB capacity. Every VM provisioning attempt failed indefinitely, while other pools on the same and different nodes had 2-7 TiB free. The CSI driver kept retrying the same doomed restore for 38+ minutes (144 retry events) until manual intervention.
Describe the solution you'd like
Add fallback logic in the CreateVolume handler when restoring from a snapshot:
- Identify snapshot replica pools
- For each pool, check available capacity vs requested volume size
- If sufficient space → proceed with local CoW restore (existing fast path, instant)
- If insufficient space on ALL snapshot replica pools → fall back to "full copy restore":
- Allocate the new volume on a different pool that matches the volume's topology constraints and has sufficient capacity
- Copy data from the snapshot replica to the new volume's replica over the network
- Complete the
CreateVolume call successfully
This is architecturally feasible because Mayastor already has the building blocks:
- Cross-node/cross-pool data copy: used during replica rebuilds (self-heal)
- Network-based replication: replicas are routinely copied between pools on different nodes
- HotSpareReconciler infrastructure: detects a missing replica and copies data from a healthy replica to a new one on a different pool
What's missing is the logic to use that same mechanism during snapshot restore. The CreateVolume from snapshot handler currently has a hardcoded assumption: "restore must happen on the same pool as the snapshot."
Describe alternatives you've considered
-
Return a non-retriable error code: If cross-pool restore is too complex, return FailedPrecondition instead of ResourceExhausted so the external-provisioner stops retrying and the orchestrator (CDI) can take alternative action. Currently ResourceExhausted is treated as retriable, causing infinite retry loops.
-
CDI-level fallback: I've filed a comment on kubevirt/containerized-data-importer#4068 requesting CDI to fall back to host-assisted copy when snapshot clone fails. This works but is slower (involves a copy pod) and doesn't leverage Mayastor's native replication.
-
Operational workaround: Ensure source image pools always have free space. This is what we do today, but it's fragile and requires constant capacity monitoring.
Additional context
| Approach |
Speed |
Space requirement |
| Same-pool CoW restore (current) |
Instant (metadata only) |
Same pool must have space |
| Cross-pool full copy (proposed fallback) |
Slower (network copy) |
Any pool with space works |
The fallback is slower but succeeds, which is better than indefinite failure.
Environment:
- Mayastor: v2.10.0
- Kubernetes: v1.34.3
- CDI: v1.64.0
- Pool configuration: 14 pools across 6 nodes (3.5 TiB and 7 TiB pools)
Related: #1895 (snapshot rebuild when pool is offline, similar problem, different trigger)
Is your feature request related to a problem? Please describe.
When creating a volume from a snapshot (CSI
CreateVolumewith snapshot source), Mayastor requires the new volume to be placed on the same pool as the snapshot's replica (because CoW snapshots are blobstore-local). If that pool is full, the operation fails with507 Insufficient Storage, even when other pools in the cluster have terabytes of free space.In our production KubeVirt environment, we use golden image snapshots to clone VM disks. The golden image's replicas were on a pool with 3.5 TiB used / 3.5 TiB capacity. Every VM provisioning attempt failed indefinitely, while other pools on the same and different nodes had 2-7 TiB free. The CSI driver kept retrying the same doomed restore for 38+ minutes (144 retry events) until manual intervention.
Describe the solution you'd like
Add fallback logic in the
CreateVolumehandler when restoring from a snapshot:CreateVolumecall successfullyThis is architecturally feasible because Mayastor already has the building blocks:
What's missing is the logic to use that same mechanism during snapshot restore. The
CreateVolumefrom snapshot handler currently has a hardcoded assumption: "restore must happen on the same pool as the snapshot."Describe alternatives you've considered
Return a non-retriable error code: If cross-pool restore is too complex, return
FailedPreconditioninstead ofResourceExhaustedso the external-provisioner stops retrying and the orchestrator (CDI) can take alternative action. CurrentlyResourceExhaustedis treated as retriable, causing infinite retry loops.CDI-level fallback: I've filed a comment on kubevirt/containerized-data-importer#4068 requesting CDI to fall back to host-assisted copy when snapshot clone fails. This works but is slower (involves a copy pod) and doesn't leverage Mayastor's native replication.
Operational workaround: Ensure source image pools always have free space. This is what we do today, but it's fragile and requires constant capacity monitoring.
Additional context
The fallback is slower but succeeds, which is better than indefinite failure.
Environment:
Related: #1895 (snapshot rebuild when pool is offline, similar problem, different trigger)