Skip to content

Conversation

@rionmonster
Copy link
Contributor

@rionmonster rionmonster commented Dec 6, 2025

Purpose

Linked issue: close #2091

Per Issue #2091, this pull request attempts to address an issue that could result in a StackOverflowError stemming from a cycle of cascading exceptions that originally stem from Flink's SerializedThrowable handler. This fix has already been made within Flink in this commit which should prevent the behavior (see discussion here for more context).

Brief change log

This change introduces a new private functions within the ExceptionUtils class to support exception chain evaluation for suppression and cause chains (existsInExceptionChain). This function is applied during the firstOrSuppressed function call, which introduces a new exception to the existing chain, and traverses both the suppression and cause chains using a graph to evaluate for existence.

Tests

The ExceptionTestUtils.testFirstOrSuppressedCyclePrevention was initially updated to use an arbitrary recursive exception call to mimic Flink's existing SerializedThrowable to reproduce the original issue (triggering a StackOverflowError). This test was later updated after the fix was applied to ensure that these previous exceptions no longer result in a cycle being created.

API and Format

N/A

Documentation

N/A

@rionmonster rionmonster changed the title Fluss 2091 [FLUSS-2091][common] Introduced Cycle Detection During Exception Suppression Dec 6, 2025
@rionmonster rionmonster marked this pull request as ready for review December 8, 2025 18:59
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces cycle detection during exception suppression to prevent StackOverflowError that can occur when cascading exceptions create cycles in the suppression or cause chains. The fix mirrors a similar solution already implemented in Apache Flink and addresses issue #2091.

Key Changes:

  • Added cycle detection logic to ExceptionUtils.firstOrSuppressed() to prevent circular references in exception chains
  • Introduced a new private helper method existsInExceptionChain() that performs graph-based traversal of both suppression and cause chains to detect cycles
  • Added comprehensive test coverage to validate cycle prevention behavior

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
fluss-common/src/main/java/org/apache/fluss/utils/ExceptionUtils.java Added cycle detection logic in firstOrSuppressed() and new existsInExceptionChain() helper method with graph-based traversal using HashSet and Deque
fluss-common/src/test/java/org/apache/fluss/utils/ExceptionUtilsTest.java Fixed typo in comment and added testFirstOrSuppressedCyclePrevention() test to validate cycle prevention in suppressed exception chains

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 390 to 391
* @param exception The throwable exception to search for.
* @param previous The previous throwable exception chain to search in.
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter names in the javadoc don't accurately describe their semantics. The parameter named exception is described as "The throwable exception to search for" and previous is described as "The previous throwable exception chain to search in". However, these names suggest temporal ordering rather than the search relationship. Consider renaming them to something like searchTarget and chainToSearch or needle and haystack to better convey that this is a search operation, or update the description to clarify the purpose more explicitly.

Copilot uses AI. Check for mistakes.
[FLUSS-2091][common] Added Exception Cycle Tests for Cause-Chain Cases
… Cases

[FLUSS-2091][common] Replaced Assertion Syntax for Exception-Handling Cases
@wuchong
Copy link
Member

wuchong commented Dec 25, 2025

This has been fixed in Flink, close this PR. See comments in the issue.

@wuchong wuchong closed this Dec 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flink taskmanager failed with StackOverflowError: null

2 participants