fix: do not restore PV verbatim when volume data was skipped via VolumePolicy#9588
fix: do not restore PV verbatim when volume data was skipped via VolumePolicy#9588mateenali66 wants to merge 5 commits intovelero-io:mainfrom
Conversation
1423c20 to
3e1a0b6
Compare
|
From the fist glance of #9318, I regard it as an expected behavior:
So the point is, |
|
Thanks @Lyndon-Li — that's a helpful clarification. If Closing this. The original issue (#9318) may be worth revisiting from a documentation or UX angle — it seems the reporters expected Thanks for the quick review. |
|
@mateenali66 I reopened your PR. |
|
Hi, wanted to follow up on this. @Lyndon-Li confirmed in the comments that the current behavior is a bug and that action=skip should skip all volume backup methods including native snapshot. The fix addresses exactly that. Would appreciate a review when you get a chance. Happy to add tests or adjust the approach if needed. |
|
Hi, fyi that some maintainers are at Kubecon EU this week. |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
@mateenali66 |
…og entry - Add TestRestorePVWithVolumeInfo case: NativeSnapshot PV with Delete reclaim policy and no snapshot data triggers dynamic re-provisioning - Add TestHandlePVHasNativeSnapshot_PropagatesErrPVNeedsReprovisioning to cover the errPVNeedsReprovisioning propagation in handlePVHasNativeSnapshot - Add changelog entry for PR velero-io#9588 Signed-off-by: Mateen Anjum <mateenali66@gmail.com>
61851eb to
e0e4fc1
Compare
…og entry - Add TestRestorePVWithVolumeInfo case: NativeSnapshot PV with Delete reclaim policy and no snapshot data triggers dynamic re-provisioning - Add TestHandlePVHasNativeSnapshot_PropagatesErrPVNeedsReprovisioning to cover the errPVNeedsReprovisioning propagation in handlePVHasNativeSnapshot - Add changelog entry for PR velero-io#9588 Signed-off-by: Mateen Anjum <mateenali66@gmail.com>
|
@mateenali66 |
…og entry - Add TestRestorePVWithVolumeInfo case: NativeSnapshot PV with Delete reclaim policy and no snapshot data triggers dynamic re-provisioning - Add TestHandlePVHasNativeSnapshot_PropagatesErrPVNeedsReprovisioning to cover the errPVNeedsReprovisioning propagation in handlePVHasNativeSnapshot - Add changelog entry for PR velero-io#9588 Signed-off-by: Mateen Anjum <mateenali66@gmail.com>
e0e4fc1 to
26eb929
Compare
|
@kaovilai @blackpiglet gentle reminder, CI is green now, this can be approved/merged whenever you feel easy |
…mePolicy When a VolumePolicy action=skip is used during backup, the volume data is intentionally not backed up. During restore, Velero was restoring the original PV identity (same VolumeHandle) which is dangerous: - With Delete reclaim policy: underlying storage may no longer exist - In cross-cluster restore: two clusters would share the same storage This adds defense-in-depth in pv_restorer.go: when no snapshot is found and the PV has a Delete reclaim policy, return errPVNeedsReprovisioning to trigger dynamic re-provisioning instead of restoring the PV as-is. The callers in restore.go (both new and legacy paths) handle this error by inserting the PV into pvsToProvision. For Retain reclaim policy PVs without snapshots, a warning is now logged about the risks of cross-cluster restore scenarios. Fixes velero-io#9318 Signed-off-by: Mateen Ali Anjum <mateenali66@gmail.com> Signed-off-by: Mateen Anjum <mateenali66@gmail.com>
…og entry - Add TestRestorePVWithVolumeInfo case: NativeSnapshot PV with Delete reclaim policy and no snapshot data triggers dynamic re-provisioning - Add TestHandlePVHasNativeSnapshot_PropagatesErrPVNeedsReprovisioning to cover the errPVNeedsReprovisioning propagation in handlePVHasNativeSnapshot - Add changelog entry for PR velero-io#9588 Signed-off-by: Mateen Anjum <mateenali66@gmail.com>
Signed-off-by: Mateen Anjum <mateenali66@gmail.com>
Signed-off-by: Mateen Anjum <mateenali66@gmail.com>
039038d to
dd7dcc1
Compare
|
@kaovilai when you get a moment, this one's been sitting since @blackpiglet approved on 4/1. would appreciate a second review if you're happy with the approach. CI green, coverage 77.8%, changelog in. fix addresses the PV restore-verbatim bug that @Lyndon-Li confirmed in #9588 (comment). |
Summary
Fixes #9318
When a
VolumePolicy action=skipis used during backup, the volume data is intentionally not backed up. This PR adds defense-in-depth to prevent restoring the original PV identity (VolumeHandle) when there's no snapshot data to back it:pv_restorer.go: WhensnapshotInfo == niland the PV has aDeletereclaim policy, returnserrPVNeedsReprovisioninginstead of the PV object. This prevents restoring a PV whose underlying storage may no longer exist.restore.go: Both the new (BackupVolumeInfo) and legacy snapshot paths inhandlePVHasNativeSnapshotnow catcherrPVNeedsReprovisioningand trigger dynamic re-provisioning viapvsToProvision.handleSkippedPVHasRetainPolicy: Added a warning log about cross-cluster restore risks when restoring a PV with its original volume identity. TheRetainpolicy case preserves existing behavior (restoring as-is) since changing it would be a breaking change.Why this is safe
Deletereclaim policy + no snapshot case was already handled correctly in the main restore flow (restoreItem). This change adds a safety net inexecutePVActionfor defense-in-depth.Retainpolicy behavior is unchanged - only a warning is added. Per maintainer feedback, this is acceptable for now.DeleteandRetainreclaim policy scenarios.Test plan
no snapshot and Delete reclaim policy: return errPVNeedsReprovisioningno snapshot and Retain reclaim policy: return PV as-isTestExecutePVAction_NoSnapshotRestorestests passTestExecutePVAction_SnapshotRestorestests pass