CNDB-17110: Fix possible deadlock if flush fails#2275
Conversation
Checklist before you submit for review
|
|
A short explanation why there are no unit tests for this: |
888a0d1 to
5dddf6e
Compare
|
The tests that failed in CI pass locally. |
3c42509 to
6635395
Compare
|
test this please |
`JVMStabilityInspector.inspectThrowable(t)` can throw an exception. In that case the control flow never reaches `postFlush.latch.countDown()` and threads waiting for flush to finish never unlock and never get a chance to propagate the exception up the call chain. This ends up in a bad lockup with no error logged anywhere, system appearing to be performing a pending flush, but no flush actually running. This scenario has been observed in tests when the system hit the limit of open files. This commit moves `latch.countDown()` to a finally block so it can never be skipped.
6635395 to
73bc816
Compare
|
❌ Build ds-cassandra-pr-gate/PR-2275 rejected by Butler6 regressions found Found 6 new test failuresFound 7 known test failures |



JVMStabilityInspector.inspectThrowable(t)can throw an exception.In that case the control flow never reaches
postFlush.latch.countDown()and threads waiting for flush to finishnever unlock and never get a chance to propagate the exception
up the call chain. This ends up in a bad lockup with no error logged
anywhere, system appearing to be performing a pending flush,
but no flush actually running. This scenario has been observed in tests
when the system hit the limit of open files.
This commit moves
latch.countDown()to a finally blockso it can never be skipped.