-
Notifications
You must be signed in to change notification settings - Fork 69
fix: Added timeout for snapshot delete action to complete #1072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
| echo "Waiting for snapshots to be removed" | ||
| sleep 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do better by waiting until the snapshot is deleted? using an API call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no way to validate result of execution.
We can check filesystem, but that is a bit over-engineering.
For me it's a bit weird this issue didn't happen before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| infrastructure/backups/backup.sh | 0/5 | Added timeout for snapshot deletion but includes critical bugs: exit 1 terminates script, jq not available in image, async deletions may fail |
1 file reviewed, 4 comments
4a1f8c9 to
51c7a24
Compare
Description
We are experiencing issues with backups on out staging v19 beta environment, the error doesn't effect production.

error log example:
Root cause of the issue is async call to elasticsearch endpoint:
/_snapshot/ocrvs/snapshot_${LABEL:-$BACKUP_DATE}?wait_for_completion=true&prettyBy adding timeout between snapshot delete and create we are going to address the issue with lower resources on staging.
Testing
Script was patched on staging environment and executed:
Checklist