-
Notifications
You must be signed in to change notification settings - Fork 59
Increase backup wait time and delete timeouts in node-backup #3662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Oriol Muñoz <oriol.munoz@digitalasset.com>
Signed-off-by: Oriol Muñoz <oriol.munoz@digitalasset.com>
| break | ||
| else | ||
| (( i++ ))&& (( i > 300 )) &&_error "Timed out waiting for backup of $description db" | ||
| (( i++ ))&& (( i > WAIT_FOR_BACKUP_RETRIES )) && { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this time out too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, but I'd rather mirror the behavior between both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the consistency but sentences that contain both delete and backup always make scared and look more closely...
cluster/scripts/node-backup.sh
Outdated
| else | ||
| (( i++ )) && (( i > 300 )) && _error "Timed out waiting for backup of $description PVC" | ||
| (( i++ )) && (( i > WAIT_FOR_BACKUP_RETRIES )) && { | ||
| kubectl delete volumesnapshot -n "$namespace" "$backupName"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we know we're allowed to delete here at all? If an operation is running then this will just fail too, won't it? But I guess that's fine because then we'll still retry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then we'll still retry
Hm or will this script just die, because we're not capturing the error code of the kubectl delete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hopefully it works and it gets deleted, but otherwise you're correct, it will exit 1 and retry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm and assuming the delete never works... will we loop forever here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_error has an exit 1 inside, so either kubectl delete fails or _error exits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it does... so fine then, not veto-ing if you feel confident about this.
martinflorian-da
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
WDYT about trying it without the deletes first? Not sure how much we're gaining for the risks here (of this not working as expected, of this deleting some other backup accidentally (low risk, I know))
Signed-off-by: Oriol Muñoz <oriol.munoz@digitalasset.com>
Attempts to fix maybe https://github.com/DACH-NY/cn-test-failures/issues/7017