-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Fix a flaky test for issue 19722 #20261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…lly.Captures any worker exception and fails the test.Adds bounded await() timeouts so it fails fast instead of hanging. Signed-off-by: Joe Liu <[email protected]>
WalkthroughThis change refactors a concurrent test in Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
plugins/cache-ehcache/src/test/java/org/opensearch/cache/store/disk/EhcacheDiskCacheManagerTests.java (2)
49-50: Import AtomicReference for consistency.
AtomicReferenceuses a fully qualified name whileCountDownLatch(line 19) andPhaser(line 20) are imported. For consistency, add the import.Add this import at the top:
+import java.util.concurrent.atomic.AtomicReference;Then simplify the declarations:
-final java.util.concurrent.atomic.AtomicReference<Throwable> createFailure = - new java.util.concurrent.atomic.AtomicReference<>(); +final AtomicReference<Throwable> createFailure = new AtomicReference<>();-final java.util.concurrent.atomic.AtomicReference<Throwable> closeFailure = new java.util.concurrent.atomic.AtomicReference<>(); +final AtomicReference<Throwable> closeFailure = new AtomicReference<>();Also applies to: 83-83
41-41: Path construction style differs from other tests in this file.Line 41 uses
resolve("request_cache")while line 124 uses string concatenation (+ "/request_cache"). Theresolve()approach is more correct and portable, but the inconsistency within the file is worth noting.Consider updating the other test methods (lines 124, 145) to use the same
resolve()pattern for consistency, though this can be done in a separate refactoring PR.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
plugins/cache-ehcache/src/test/java/org/opensearch/cache/store/disk/EhcacheDiskCacheManagerTests.java(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: Analyze (java)
- GitHub Check: detect-breaking-change
🔇 Additional comments (1)
plugins/cache-ehcache/src/test/java/org/opensearch/cache/store/disk/EhcacheDiskCacheManagerTests.java (1)
38-106: Excellent fix for the flaky test!The refactored test correctly addresses all three issues from the PR description:
- ✓ Thread-safety: array-based storage (line 44) eliminates concurrent modification issues
- ✓ Timeouts: 60-second timeouts on await calls (lines 73, 99) prevent indefinite hangs
- ✓ Error propagation: AtomicReference captures worker exceptions (lines 64, 91) and fails the test explicitly (lines 74-76, 100-102)
The two-phase concurrency model with Phaser synchronization ensures threads start simultaneously, making the test more effective at detecting race conditions in
EhcacheDiskCacheManager.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #20261 +/- ##
============================================
+ Coverage 73.20% 73.23% +0.02%
- Complexity 71745 71797 +52
============================================
Files 5795 5795
Lines 328304 328304
Branches 47283 47283
============================================
+ Hits 240334 240427 +93
+ Misses 68663 68614 -49
+ Partials 19307 19263 -44 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Description
Fixed the test since this suite-timeout is almost certainly coming from the test hanging because diskCacheAliases is a plain ArrayList that it mutates from multiple threads, which is not thread-safe and can corrupt the list or leave it shorter than expected. Both countDownLatch.await() calls have no timeout, so if any worker thread throws before calling countDown(), the test will wait until the suite timeout kills it. The code also doesn’t capture worker exceptions, so failures can silently prevent latches from reaching zero.
Related Issues
Resolves #19722
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.