Missing delete event on watch opened on same revision as compaction request #19179
Description
Bug report criteria
- This bug report is not security related, security issues should be disclosed privately via etcd maintainers.
- This is not a support request or question, support requests or questions should be raised in the etcd discussion forums.
- You have read the etcd bug reporting guidelines.
- Existing open issues along with etcd frequently asked questions have been checked and this is not a duplicate.
What happened?
Starting from 9 January we started getting failures on presubmit tests.
Presubmit history goes up to December 31, with failures only starting on implying the issue is new.
Failues are due to resumable guarantee being broken
logger.go:146: 2025-01-10T22:33:35.465Z ERROR Broke watch guarantee {"guarantee": "resumable", "client": 4, "request": {"Key":"/registry/pods/","Revision":409,"WithPrefix":true,"WithProgressNotify":true,"WithPrevKV":true}, "got-event": {"Type":"delete-operation","Key":"/registry/pods/default/jCocA","Value":{"Value":"","Hash":0},"Revision":410,"IsCreate":false,"PrevValue":{"Value":{"Value":"143","Hash":0},"ModRevision":146}}, "want-event": {"Type":"delete-operation","Key":"/registry/pods/default/OL767","Value":{"Value":"","Hash":0},"Revision":409,"IsCreate":false}}
validate.go:48: Failed validating watch history, err: broke Resumable - A broken watch can be resumed by establishing a new watch starting after the last revision received in a watch event before the break, so long as the revision is in the history window
From history visualizations I have seen it follows pattern:
- Delete operation on rev X
- Compect on Rev X
- Etcd crashes on Rev X
- Watch opened on Rev X
What did you expect to happen?
Resumable guarantee should not be broken.
How can we reproduce it (as minimally and precisely as possible)?
Didn't yet managed to reproduce it locally.
Anything else we need to know?
https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-robustness-amd64/1877585036438409216
https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-robustness-arm64/1877364764741472256
https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-robustness-arm64/1877466683589791744
https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-robustness-arm64/1877575502907052032
https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-robustness-arm64/1877585037260492800
https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-robustness-arm64/1877678423757819904
https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-robustness-arm64/1877842586459181056
https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-robustness-arm64/1878101264374435840
https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-robustness-arm64/1878113522366287872
https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/directory/pull-etcd-robustness-arm64/1878196741560340480
Etcd version (please run commands below)
I was not able to reproduce the issue outside of CI, so I haven't confirmed other versions
Etcd configuration (command line flags or environment variables)
N/A
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
N/A
Relevant log output
No response