Skip to content

Conversation

@rionmonster
Copy link
Contributor

@rionmonster rionmonster commented Oct 27, 2025

Purpose

Linked Issue: Close #1855

Per Issue #1855, this pull request addresses a potential issue that could arise during the remove() operation for TimerTaskEntry that could result in a null pointer exception being thrown. This was possible as the underlying object to support the internal list was volatile and subject to operations from other threads.

Brief change log

Update the logic within the TimerTaskEntry.remove() operation to use a more concurrency-safe single-read snapshot pattern to avoid separate reads (e.g., reading during iteration and within the body as well). These changes were originally reproduced via a newly added unit test and retained to ensure the issue was resolved as expected.

Tests

Added a new TimerTaskEntryTest class along with an associated TimerTaskEntryTest.testRemoveEnsuresCurrentListNullSafety case that originally reproduced this issue (via concurrent, oscillating add/removals across separate threads) which was eventually updated after the fix was applied to confirm the exception will no longer be thrown.

API and Format

N/A

Documentation

N/A

[server] Added test to reproduce NPE during TimerTaskEntry removal

[server] Added test to reproduce NPE during TimerTaskEntry removal
[server] Added Safety Check for TimerTaskEntry Removal to Avoid NPE
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a potential NullPointerException in the TimerTaskEntry.remove() method caused by a Time-Of-Check-Time-Of-Use (TOCTOU) race condition. The fix implements a single-read snapshot pattern to safely handle the volatile list field in concurrent scenarios.

Key Changes:

  • Modified the remove() method to read the volatile list field once per loop iteration using an assignment within the while condition
  • Added a comprehensive concurrency test (TimerTaskEntryTest) to verify the fix prevents NPEs during concurrent add/remove operations

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
fluss-server/src/main/java/org/apache/fluss/server/utils/timer/TimerTaskEntry.java Updated remove() method to use single-read snapshot pattern, eliminating TOCTOU bug by assigning list directly in the while condition
fluss-server/src/test/java/org/apache/fluss/server/utils/timer/TimerTaskEntryTest.java Added new test class with concurrency test that reproduces the race condition and verifies the fix prevents NullPointerException

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rionmonster for the awesome work and for providing a reproducible test case! The proposed fix looks solid to me.

@wuchong wuchong merged commit 6a2586f into apache:main Dec 24, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Npe occur when remove entry from TimerTaskEntry

2 participants