-
Notifications
You must be signed in to change notification settings - Fork 458
[FLUSS-1855][server] Add Null Safety for TimerTaskEntry Removal #1872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[server] Added test to reproduce NPE during TimerTaskEntry removal [server] Added test to reproduce NPE during TimerTaskEntry removal
[server] Added Safety Check for TimerTaskEntry Removal to Avoid NPE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a potential NullPointerException in the TimerTaskEntry.remove() method caused by a Time-Of-Check-Time-Of-Use (TOCTOU) race condition. The fix implements a single-read snapshot pattern to safely handle the volatile list field in concurrent scenarios.
Key Changes:
- Modified the
remove()method to read the volatilelistfield once per loop iteration using an assignment within the while condition - Added a comprehensive concurrency test (
TimerTaskEntryTest) to verify the fix prevents NPEs during concurrent add/remove operations
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
fluss-server/src/main/java/org/apache/fluss/server/utils/timer/TimerTaskEntry.java |
Updated remove() method to use single-read snapshot pattern, eliminating TOCTOU bug by assigning list directly in the while condition |
fluss-server/src/test/java/org/apache/fluss/server/utils/timer/TimerTaskEntryTest.java |
Added new test class with concurrency test that reproduces the race condition and verifies the fix prevents NullPointerException |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fluss-server/src/test/java/org/apache/fluss/server/utils/timer/TimerTaskEntryTest.java
Outdated
Show resolved
Hide resolved
fluss-server/src/test/java/org/apache/fluss/server/utils/timer/TimerTaskEntryTest.java
Outdated
Show resolved
Hide resolved
fluss-server/src/test/java/org/apache/fluss/server/utils/timer/TimerTaskEntryTest.java
Outdated
Show resolved
Hide resolved
fluss-server/src/test/java/org/apache/fluss/server/utils/timer/TimerTaskEntryTest.java
Show resolved
Hide resolved
wuchong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rionmonster for the awesome work and for providing a reproducible test case! The proposed fix looks solid to me.
Purpose
Linked Issue: Close #1855
Per Issue #1855, this pull request addresses a potential issue that could arise during the
remove()operation forTimerTaskEntrythat could result in a null pointer exception being thrown. This was possible as the underlying object to support the internal list was volatile and subject to operations from other threads.Brief change log
Update the logic within the
TimerTaskEntry.remove()operation to use a more concurrency-safe single-read snapshot pattern to avoid separate reads (e.g., reading during iteration and within the body as well). These changes were originally reproduced via a newly added unit test and retained to ensure the issue was resolved as expected.Tests
Added a new
TimerTaskEntryTestclass along with an associatedTimerTaskEntryTest.testRemoveEnsuresCurrentListNullSafetycase that originally reproduced this issue (via concurrent, oscillating add/removals across separate threads) which was eventually updated after the fix was applied to confirm the exception will no longer be thrown.API and Format
N/A
Documentation
N/A