Skip to content

fix(jsonata): prevent StackOverflowError on small JVM stacks via eval thread isolation + lower default maxDepth#80

Open
fdelbrayelle wants to merge 10 commits intomainfrom
test/stack-overflow-regression
Open

fix(jsonata): prevent StackOverflowError on small JVM stacks via eval thread isolation + lower default maxDepth#80
fdelbrayelle wants to merge 10 commits intomainfrom
test/stack-overflow-regression

Conversation

@fdelbrayelle
Copy link
Copy Markdown
Member

@fdelbrayelle fdelbrayelle commented Apr 30, 2026

Problem

Jsonata$Frame.lookup() in dashjoin/jsonata-java walks the parent scope chain via tail recursion. Java does not optimize tail calls, so every JSONata recursive function call pushes ~8 JVM frames onto the thread stack. With the old default maxDepth=200, the evaluator could push 200 × 8 = 1,600 JVM frames before the bounds check fired — far past what a 256 KB worker stack (~300 usable frames) can hold.

StackOverflowError is a JVM Error, not an Exception — it escapes all normal try/catch blocks and crashes the worker thread.

Why upstream fix wasn't enough: dashjoin/jsonata-java#107 (iterative lookup()) was reviewed by the maintainer who confirmed the iterative change removes only D≈4 frames from the top of the stack. Since overflow is driven by N × frames-per-eval-level (not D), the upstream PR was not merged. The fix must live here.

Fix

Two-layer defense:

1. Lower default maxDepth 200 → 50

50 × ~8 JVM frames = ~400 frames — safe on a 256 KB stack (~300 usable frames). setRuntimeBounds now fires and throws JException before any stack risk.

Users with legitimate deep recursion can override maxDepth, but should also increase the JVM stack size (-Xss).

2. Dedicated eval thread with 4 MB stack

evaluate() runs on new Thread(null, runnable, "jsonata-eval", 4 * 1024 * 1024). This means:

  • Normal case: worker stack size (e.g. 256 KB on Windows) cannot constrain the evaluator at all.
  • Edge case (user sets very high maxDepth): if StackOverflowError occurs in the eval thread, it is caught as Throwable in that throwaway thread. The worker thread reads the stored error and throws a clean RuntimeExceptionthe worker never crashes.

This is safe because the SOE is contained inside a thread we immediately discard. No JVM state is shared between the eval thread's stack and the worker thread.

Regression tests

Test What it verifies
shouldNeverThrowStackOverflowForCommonMaxDepthValues (parametrized: 50, 200, 500, 1000) Bounds check fires → JExceptionRuntimeException, never StackOverflowError, even on -Xss512k JVM
shouldIsolateStackOverflowInEvalThreadWhenMaxDepthExceedsStackCapacity $f(49999) with maxDepth=50000 overflows eval thread → worker gets RuntimeException(cause=StackOverflowError)

Test plan

  • All existing tests pass
  • Parametrized test: maxDepth=50/200/500/1000 all produce RuntimeException(cause=JException) on -Xss512k JVM
  • Isolation test: extreme maxDepth produces RuntimeException(cause=StackOverflowError), test JVM does not crash

Reproduces dashjoin/jsonata-java#107.
Frame.lookup() recurses once per scope frame — on 256 KB (Windows) or
512 KB (Linux default) thread stacks, a 999/1999-deep recursive JSONata
expression overflows before maxDepth fires. Tests use explicit Thread
stack sizes so the scenario is reproducible on any platform.

Passes once jsonata-java ships the iterative Frame.lookup() fix.
…ack helper

Both Windows and Linux tests now use SMALL_STACK_BYTES = 256 KB via
Thread(stackSize) so no -Xss JVM flag is needed in build config.
Extract runOnSmallStack() helper to avoid duplication.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 30, 2026

📦 Artifacts

Name Size Updated Expiration
jar 2.96 MB May 2, 26, 12:47:44 PM UTC May 9, 26, 12:47:43 PM UTC

🧪 Java Unit Tests

TestsPassed ✅Skipped ⚠️FailedTime ⏱
Java Tests Report141 ran140 ✅1 ⚠️0 ❌46s 261ms

🔁 Unreleased Commits

✅ No unreleased commits found.

Tests now assert isInstanceOf(StackOverflowError.class) — current behavior
without the upstream fix. Rename constant to XSS_256K and add Javadoc linking
it to -Xss256k so the stack constraint is self-documenting.
Comment marks where assertions flip once dashjoin/jsonata-java#107 ships.
Thread(stackSize) is advisory — JVM ignores it on Linux. Add -Xss256k to
test JVM args in plugin-transform-json/build.gradle via afterEvaluate so
it appends after the root build's jvmArgs= Allure assignment. Simplify
tests to call task.run() directly and assert StackOverflowError with
assertThatThrownBy.
…eplace)

afterEvaluate in the subproject was silently wiped because the root Allure
subprojects{} block used jvmArgs=[...] (full assignment). Change root to
jvmArgs append; subproject test{jvmArgs '-Xss256k'} then safely extends it
without ordering ambiguity.
…tack

-Xss256k is silently ignored by OpenJDK on Linux 64-bit (enforces a higher
minimum). Instead use depth=4999/9999 which overflows the default 512 KB-1 MB
thread stack without any JVM flag (~150 bytes/frame × 9999 ≈ 1.5 MB needed).
… in CI

Linux default thread stack is ~8 MB (OS RLIMIT_STACK), far too large to
reproduce the crash without -Xss. Pin test JVM to -Xss512k (above HotSpot
minimum, safe for Kestra framework). Use depth=49999 so 49999 JVM frames
× minimum 16 bytes/frame = 800 KB > 512k — overflow guaranteed regardless
of JIT optimization level.
…t TCO

jsonata-java TCO-optimises tail-recursive calls into a loop (O(1) stack),
so $f($n-1) never caused a stack overflow. Adding + 0 after the recursive
call puts it in non-tail position, forcing each frame to stay live and
producing the expected StackOverflowError on -Xss512k.
Tests document the current bug state: StackOverflowError is expected.
Will flip to assertThatNoException once dashjoin/jsonata-java#107 ships.
Each JSONata recursion level pushes ~8 JVM frames. On 256 KB worker
stacks (~300 usable frames), the old default maxDepth=200 allowed
200 × 8 = 1600 frames before the bounds check fired — far past overflow.

Two-layer fix:
1. Lower default maxDepth 200 → 50 (50 × 8 = 400 frames, safe on 256 KB).
   setRuntimeBounds fires at depth 50, throwing JException cleanly.
2. Run evaluate() on a dedicated thread with an explicit 4 MB stack.
   Worker thread stack size (e.g. 256 KB on Windows) can no longer
   constrain the evaluator. If a user sets a very high maxDepth and
   triggers StackOverflowError anyway, it is caught as Throwable inside
   the throwaway eval thread; the worker thread gets a clean
   RuntimeException instead of crashing.

Update regression tests:
- Parametrized test: maxDepth=50/200/500/1000 all produce JException,
  never StackOverflowError, even on -Xss512k JVM.
- Isolation test: maxDepth=50000 with $f(49999) overflows the eval
  thread but worker receives RuntimeException(cause=StackOverflowError).

Closes dashjoin/jsonata-java#107 dependency — fix is now self-contained
in plugin-transform without requiring upstream changes.
@fdelbrayelle fdelbrayelle changed the title test(jsonata): regression tests for StackOverflowError on small JVM stacks fix(jsonata): prevent StackOverflowError on small JVM stacks via eval thread isolation + lower default maxDepth May 2, 2026
@fdelbrayelle fdelbrayelle self-assigned this May 2, 2026
// 2. Edge case (user sets very high maxDepth): if a StackOverflowError occurs in the eval
// thread, it is contained there. The worker thread reads the stored error and throws a
// clean RuntimeException — the worker never crashes.
var thread = new Thread(null, () -> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so each record allocates a new Thread with a 4 MB stack, for batch workloads this could be a regression?

try {
var result = this.parsedExpression.evaluate(data, frame);
resultRef.set(result != null ? MAPPER.valueToTree(result) : NullNode.getInstance());
} catch (Throwable t) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we narrow it down?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants