Add delay to avoid race conditions during VolumeSnapshotContent deletion#9700
Add delay to avoid race conditions during VolumeSnapshotContent deletion#9700kaovilai merged 4 commits intovelero-io:mainfrom
Conversation
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #9700 +/- ##
==========================================
- Coverage 60.96% 60.94% -0.02%
==========================================
Files 384 384
Lines 36595 36596 +1
==========================================
- Hits 22310 22305 -5
- Misses 12676 12681 +5
- Partials 1609 1610 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
shubham-pampattiwar
left a comment
There was a problem hiding this comment.
Thanks for the fix @priyansh17.
The blind time.Sleep(2s) concerns me -- it's not guaranteed sufficient under load, and it adds unnecessary delay on clusters where the upstream fix is already present (external-snapshotter >= v9.2.0). For backups with many CSI snapshots the cumulative cost adds up.
Your issue actually suggests polling for snapshot.storage.kubernetes.io/volumesnapshotcontent-bound-protection finalizer instead. Could we use wait.PollUntilContextTimeout to check for it? That way we return as soon as the sidecar has actually processed the object rather than guessing how long to wait.
Thoughts ?
cc @blackpiglet
Hi, Thanks @shubham-pampattiwar for reviewing. I agree for new K8s clusters it is not required but it prevents any unwanted behaviour if the sidecar logic changes in future as well for handling concurrent CRUD operations. Earlier we had a poll on readyToUse which was unnecessary but solving this problem, as we removed it I intend to add this gap. |
3e9da9b to
c5247f5
Compare
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
c5247f5 to
49953ad
Compare
There was a problem hiding this comment.
Instead of sleeping, should it be watching VSC for status updates/bound/avail phase etc which would be more indicative of storage driver recognizing the operation?
There was a problem hiding this comment.
I guess that discussion was had.. alas better than what we had before.
There was a problem hiding this comment.
Yes Thanks. Shubham had same point which was discussed
…ion (velero-io#9700) * Add delay to avoid race conditions during VolumeSnapshotContent deletion Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * updated changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Updated Changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
…ion (velero-io#9700) * Add delay to avoid race conditions during VolumeSnapshotContent deletion Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * updated changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Updated Changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> Signed-off-by: emirot <emirot.nolan@gmail.com>
* perf: better string concatenation Signed-off-by: emirot <emirot.nolan@gmail.com> Signed-off-by: nolanemirot <nolan.emirot@broadcom.com> Signed-off-by: emirot <emirot.nolan@gmail.com> * fix: backup deletion silently succeeds when tarball download fails (#9693) * Enhance backup deletion logic to handle tarball download failures and clean up associated CSI VolumeSnapshotContents Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * added changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Refactor error handling in backup deletion Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Refactor backup deletion logic to skip CSI snapshot cleanup on tarball download failure Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * prevent backup deletion when errors occur Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * added logger Signed-off-by: Priyansh Choudhary <im1706@gmail.com> Signed-off-by: emirot <emirot.nolan@gmail.com> * perf: better string concatenation Signed-off-by: emirot <emirot.nolan@gmail.com> * Add delay to avoid race conditions during VolumeSnapshotContent deletion (#9700) * Add delay to avoid race conditions during VolumeSnapshotContent deletion Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * updated changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Updated Changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> Signed-off-by: emirot <emirot.nolan@gmail.com> * block data mover design Signed-off-by: Lyndon-Li <lyonghui@vmware.com> Signed-off-by: emirot <emirot.nolan@gmail.com> * block data mover design Signed-off-by: Lyndon-Li <lyonghui@vmware.com> Signed-off-by: emirot <emirot.nolan@gmail.com> * irregular volume size Signed-off-by: Lyndon-Li <lyonghui@vmware.com> Signed-off-by: emirot <emirot.nolan@gmail.com> * block data mover design Signed-off-by: Lyndon-Li <lyonghui@vmware.com> Signed-off-by: emirot <emirot.nolan@gmail.com> * Update the "community" page of website (#9722) Update the community page to add the correct links to community meeting and meeting notes. I also removed the referece of google group as I confirmed the last message was sent 2 years ago. Signed-off-by: Daniel Jiang <daniel.jiang@broadcom.com> Signed-off-by: emirot <emirot.nolan@gmail.com> * perf: better string concatenation Signed-off-by: emirot <emirot.nolan@gmail.com> --------- Signed-off-by: emirot <emirot.nolan@gmail.com> Signed-off-by: nolanemirot <nolan.emirot@broadcom.com> Signed-off-by: Priyansh Choudhary <im1706@gmail.com> Signed-off-by: Lyndon-Li <lyonghui@vmware.com> Signed-off-by: Daniel Jiang <daniel.jiang@broadcom.com> Co-authored-by: Priyansh Choudhary <im1706@gmail.com> Co-authored-by: nolanemirot <nolan.emirot@broadcom.com> Co-authored-by: Lyndon-Li <lyonghui@vmware.com> Co-authored-by: Daniel Jiang <daniel.jiang@broadcom.com>
…ion (velero-io#9700) * Add delay to avoid race conditions during VolumeSnapshotContent deletion Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * updated changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Updated Changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
…ion (velero-io#9700) * Add delay to avoid race conditions during VolumeSnapshotContent deletion Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * updated changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Updated Changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
…ion (velero-io#9700) * Add delay to avoid race conditions during VolumeSnapshotContent deletion Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * updated changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Updated Changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
* perf: better string concatenation Signed-off-by: emirot <emirot.nolan@gmail.com> Signed-off-by: nolanemirot <nolan.emirot@broadcom.com> Signed-off-by: emirot <emirot.nolan@gmail.com> * fix: backup deletion silently succeeds when tarball download fails (velero-io#9693) * Enhance backup deletion logic to handle tarball download failures and clean up associated CSI VolumeSnapshotContents Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * added changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Refactor error handling in backup deletion Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Refactor backup deletion logic to skip CSI snapshot cleanup on tarball download failure Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * prevent backup deletion when errors occur Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * added logger Signed-off-by: Priyansh Choudhary <im1706@gmail.com> Signed-off-by: emirot <emirot.nolan@gmail.com> * perf: better string concatenation Signed-off-by: emirot <emirot.nolan@gmail.com> * Add delay to avoid race conditions during VolumeSnapshotContent deletion (velero-io#9700) * Add delay to avoid race conditions during VolumeSnapshotContent deletion Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * updated changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Updated Changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> Signed-off-by: emirot <emirot.nolan@gmail.com> * block data mover design Signed-off-by: Lyndon-Li <lyonghui@vmware.com> Signed-off-by: emirot <emirot.nolan@gmail.com> * block data mover design Signed-off-by: Lyndon-Li <lyonghui@vmware.com> Signed-off-by: emirot <emirot.nolan@gmail.com> * irregular volume size Signed-off-by: Lyndon-Li <lyonghui@vmware.com> Signed-off-by: emirot <emirot.nolan@gmail.com> * block data mover design Signed-off-by: Lyndon-Li <lyonghui@vmware.com> Signed-off-by: emirot <emirot.nolan@gmail.com> * Update the "community" page of website (velero-io#9722) Update the community page to add the correct links to community meeting and meeting notes. I also removed the referece of google group as I confirmed the last message was sent 2 years ago. Signed-off-by: Daniel Jiang <daniel.jiang@broadcom.com> Signed-off-by: emirot <emirot.nolan@gmail.com> * perf: better string concatenation Signed-off-by: emirot <emirot.nolan@gmail.com> --------- Signed-off-by: emirot <emirot.nolan@gmail.com> Signed-off-by: nolanemirot <nolan.emirot@broadcom.com> Signed-off-by: Priyansh Choudhary <im1706@gmail.com> Signed-off-by: Lyndon-Li <lyonghui@vmware.com> Signed-off-by: Daniel Jiang <daniel.jiang@broadcom.com> Co-authored-by: Priyansh Choudhary <im1706@gmail.com> Co-authored-by: nolanemirot <nolan.emirot@broadcom.com> Co-authored-by: Lyndon-Li <lyonghui@vmware.com> Co-authored-by: Daniel Jiang <daniel.jiang@broadcom.com>
* fix: backup deletion silently succeeds when tarball download fails (#9693) * Enhance backup deletion logic to handle tarball download failures and clean up associated CSI VolumeSnapshotContents Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * added changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Refactor error handling in backup deletion Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Refactor backup deletion logic to skip CSI snapshot cleanup on tarball download failure Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * prevent backup deletion when errors occur Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * added logger Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Add delay to avoid race conditions during VolumeSnapshotContent deletion (#9700) * Add delay to avoid race conditions during VolumeSnapshotContent deletion Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * updated changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Updated Changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Replace t.context() with context.TODO() for older go versions Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Updated changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> --------- Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
* fix: backup deletion silently succeeds when tarball download fails (#9693) * Enhance backup deletion logic to handle tarball download failures and clean up associated CSI VolumeSnapshotContents Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * added changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Refactor error handling in backup deletion Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Refactor backup deletion logic to skip CSI snapshot cleanup on tarball download failure Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * prevent backup deletion when errors occur Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * added logger Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Add delay to avoid race conditions during VolumeSnapshotContent deletion (#9700) * Add delay to avoid race conditions during VolumeSnapshotContent deletion Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * updated changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Updated Changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Updated changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> --------- Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
* perf: better string concatenation Signed-off-by: emirot <emirot.nolan@gmail.com> Signed-off-by: nolanemirot <nolan.emirot@broadcom.com> Signed-off-by: emirot <emirot.nolan@gmail.com> * fix: backup deletion silently succeeds when tarball download fails (velero-io#9693) * Enhance backup deletion logic to handle tarball download failures and clean up associated CSI VolumeSnapshotContents Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * added changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Refactor error handling in backup deletion Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Refactor backup deletion logic to skip CSI snapshot cleanup on tarball download failure Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * prevent backup deletion when errors occur Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * added logger Signed-off-by: Priyansh Choudhary <im1706@gmail.com> Signed-off-by: emirot <emirot.nolan@gmail.com> * perf: better string concatenation Signed-off-by: emirot <emirot.nolan@gmail.com> * Add delay to avoid race conditions during VolumeSnapshotContent deletion (velero-io#9700) * Add delay to avoid race conditions during VolumeSnapshotContent deletion Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * updated changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Updated Changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> Signed-off-by: emirot <emirot.nolan@gmail.com> * block data mover design Signed-off-by: Lyndon-Li <lyonghui@vmware.com> Signed-off-by: emirot <emirot.nolan@gmail.com> * block data mover design Signed-off-by: Lyndon-Li <lyonghui@vmware.com> Signed-off-by: emirot <emirot.nolan@gmail.com> * irregular volume size Signed-off-by: Lyndon-Li <lyonghui@vmware.com> Signed-off-by: emirot <emirot.nolan@gmail.com> * block data mover design Signed-off-by: Lyndon-Li <lyonghui@vmware.com> Signed-off-by: emirot <emirot.nolan@gmail.com> * Update the "community" page of website (velero-io#9722) Update the community page to add the correct links to community meeting and meeting notes. I also removed the referece of google group as I confirmed the last message was sent 2 years ago. Signed-off-by: Daniel Jiang <daniel.jiang@broadcom.com> Signed-off-by: emirot <emirot.nolan@gmail.com> * perf: better string concatenation Signed-off-by: emirot <emirot.nolan@gmail.com> --------- Signed-off-by: emirot <emirot.nolan@gmail.com> Signed-off-by: nolanemirot <nolan.emirot@broadcom.com> Signed-off-by: Priyansh Choudhary <im1706@gmail.com> Signed-off-by: Lyndon-Li <lyonghui@vmware.com> Signed-off-by: Daniel Jiang <daniel.jiang@broadcom.com> Co-authored-by: Priyansh Choudhary <im1706@gmail.com> Co-authored-by: nolanemirot <nolan.emirot@broadcom.com> Co-authored-by: Lyndon-Li <lyonghui@vmware.com> Co-authored-by: Daniel Jiang <daniel.jiang@broadcom.com>
* fix: backup deletion silently succeeds when tarball download fails (#9693) * Enhance backup deletion logic to handle tarball download failures and clean up associated CSI VolumeSnapshotContents Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * added changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Refactor error handling in backup deletion Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Refactor backup deletion logic to skip CSI snapshot cleanup on tarball download failure Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * prevent backup deletion when errors occur Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * added logger Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Add delay to avoid race conditions during VolumeSnapshotContent deletion (#9700) * Add delay to avoid race conditions during VolumeSnapshotContent deletion Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * updated changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Updated Changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Update changelog Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * fix: correct typo in comment regarding excluded volumes in TestGetVolumesByPod Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * Pin the sigs.k8s.io/controller-runtime to v0.23.2 The tag used to latest. Due to latest tag v0.23.3 already used Golang v1.26, Velero main still uses v1.25. Build failed. To fix this, pin the controller-runtime to v0.23.2 Signed-off-by: Xun Jiang <xun.jiang@broadcom.com> Signed-off-by: Priyansh Choudhary <im1706@gmail.com> * fix: update setup-envtest version in Dockerfile Signed-off-by: Priyansh Choudhary <im1706@gmail.com> --------- Signed-off-by: Priyansh Choudhary <im1706@gmail.com> Signed-off-by: Xun Jiang <xun.jiang@broadcom.com> Co-authored-by: Xun Jiang <xun.jiang@broadcom.com>
Signed-off-by: Priyansh Choudhary im1706@gmail.com
Adda delay to avoid race conditions for a CSI Snapshotter bug in K8s Sidecar controller.
Details added in the issue linked below.
Does your change fix a particular issue?
Fixes #9699
Please indicate you've done the following:
make new-changelog) or comment/kind changelog-not-requiredon this PR.site/content/docs/main.