Loop in closure but only when the second parse invocation. #4923

davdres · 2026-03-04T18:01:42Z

davdres
Mar 4, 2026

I am parsing PL/SQL with a derivation of the ANTLR4 PL/SQL grammar. My parser is written in Java and reads sql files from a zip input file and parses them serially. Say I have two sql files parse1.sql and parse2.sql. If I package these separately as zip files, they parse without error. However, if I package them together, the second file loops in closure. I instantiate a new lexer and parser for every parse. After each parse I call getInterpreter().clearDFA() on both the lexer and the parser. The symptoms seems to imply, that the first parse is having some effect on the second parse.
Is there any way that the first parse could affect the second parse, even when a new instance of the parser and lexer are constructed?
Normally the parser can handle hundreds and even thousands of sql files in zip. This situation seems to also be data dependent, because if I start trimming the code down in one of the files eventually they both parse. Both files are very similar, they are different versions of the same code.

Answered by kaby76

Mar 11, 2026

The problem is in merge(). When two threads are working on the same context, one thread can create a partially completed result that another thread reads. When this happens, the parent chains can become circular.

Yes, it's a bug. We could fix it by single-threading merge(), but that would kill performance. So, it's recommended to just replace the DFA[] and PredictionContextCache caches. Admittedly a hack.

So, here is an update to your code that seems to work.
antlr-report-fixed.zip It changes four things.

Removed clearDFA() calls

Deleted plsqlParser.getInterpreter().clearDFA() and lexer.getInterpreter().clearDFA() after each parse.

Why? The DFA is a shared static state across all parser…

View full answer

davdres · 2026-03-04T22:09:25Z

davdres
Mar 4, 2026
Author

I discovered that if I add a method to the PredictionContextCache to clear the cache Map, the loop goes away and the both files parse fine. Would this be considered a bug?

1 reply

kaby76 Mar 4, 2026

Please provide a reproducible Java example so we can debug.

davdres · 2026-03-04T23:45:16Z

davdres
Mar 4, 2026
Author

I'll give it a try. If I can create a smaller reproducible test case. I'll close this issue.

0 replies

davdres · 2026-03-10T21:41:37Z

davdres
Mar 10, 2026
Author

I have a stripped down example. In my original post I said that I was parsing serially. This example is currently parsing each file in a different thread. When I reproduced the error parsing serially, I had to mess around with the zip file to get files to parse in a different order. With the attached example you can just run it multiple times and sometimes it will hang and sometimes parse successfully.
antlr-report.zip

0 replies

kaby76 · 2026-03-11T13:43:28Z

kaby76
Mar 11, 2026

clearDFA() is not safe to call from one thread while another is mid-parse because one thread may be modifying sharedContextCache that another is modifying. You should remove the call from ParseSql.java. I don't know if clearDFA() is supposed to be thread safe.

The problem is your grammar. It's extraordinarily slow due to ambiguity and max-k's. You can't work around this by parallelizing the parses.

$ !trperf
trperf ../../../test/input/slow/slow/test1.sql -h -c aFdriTkmfaet | ( head -n 1 && tail -n +2 | sort -k1 -n -r;     ) | head | column -t
Time to parse: 00:00:30.9773488
Ambiguities  File                                     Decision  Rule                         Invocations  Time       Total-k  Max-k  Fallback  Ambiguities  Errors  Transitions
375          ../../../test/input/slow/slow/test1.sql  2220      general_element_part         1222         5.299353   3485     6      375       375          0       1184
146          ../../../test/input/slow/slow/test1.sql  1930      atom                         1213         4.801374   3891     19     146       146          0       618
93           ../../../test/input/slow/slow/test1.sql  1667      table_ref_aux                93           72.568534  19234    1323   93        93           0       13216
84           ../../../test/input/slow/slow/test1.sql  1813      dml_table_expression_clause  84           2.08957    307      3      84        84           0       95
67           ../../../test/input/slow/slow/test1.sql  2129      routine_name                 148          2.340354   1312     26     67        67           0       812
52           ../../../test/input/slow/slow/test1.sql  1531      statement                    371          27.078977  14945    1671   52        52           0       7985
52           ../../../test/input/slow/slow/test1.sql  1489      declare_spec                 82           0.336624   1029     31     52        52           0       383
39           ../../../test/input/slow/slow/test1.sql  2195      type_spec                    92           0.654603   334      5      62        39           0       139
36           ../../../test/input/slow/slow/test1.sql  1659      table_ref_base               93           69.811876  19584    1323   63        36           0       13207
03/11-09:38:15 /c/Users/Kenne/Downloads/antlr-report/antlr-report/sql-antlr4-parser/grammar/Generated-CSharp
$

I can't see the .dot files because there is a bug in the Antlr4 tool, so I can't see why table_ref_aux has a max-k of 1323. The problem is that you use EOF on the RHS of a lexer rule. You really should not do that.

2 replies

davdres Mar 11, 2026
Author

I removed:

The clearDFA calls on both the Lexer and Parser.
The setTrimParseTree(true) call on the Parser
In the grammar I eliminated the duplicate start rule compilation_unit.
I removed all of the references to EOF in lexer rules

Non of this changed the behavior. I understand that the grammar is not optimal, but would that account for the inconsistency between executions with the same grammar and test data? It will succeed sometimes in under 10 secs, but when it hangs it can be for minutes... Actually once it hangs I've never seen it complete. Before I trimmed down to this test case, I let it run for a weekend without completing. In other words can I count on the same grammar and same test data being deterministic?

Regarding the grammar your comment about EOF was helpful. I experimented with your analysis tools a while back but ran into an issue with this grammar. Your comment and this discussion helped me understand why I had that issue. So, I will definitely try the tools again.

kaby76 Mar 11, 2026

In other words can I count on the same grammar and same test data being deterministic?

Pretty sure the parse outside of doing things in parallel is deterministic. Let me take a closer look. It's some kind of a race condition.

Sorry about the issues with Trash. The start rule can be set for trgen by -s the-name-of-the-rule.

kaby76 · 2026-03-11T23:59:20Z

kaby76
Mar 11, 2026

The problem is in merge(). When two threads are working on the same context, one thread can create a partially completed result that another thread reads. When this happens, the parent chains can become circular.

Yes, it's a bug. We could fix it by single-threading merge(), but that would kill performance. So, it's recommended to just replace the DFA[] and PredictionContextCache caches. Admittedly a hack.

So, here is an update to your code that seems to work.
antlr-report-fixed.zip It changes four things.

Removed clearDFA() calls

Deleted plsqlParser.getInterpreter().clearDFA() and lexer.getInterpreter().clearDFA() after each parse.

Why? The DFA is a shared static state across all parser instances. In a parallel stream, one thread could call clearDFA() while other threads were actively using that DFA to parse, corrupting their state mid-parse.

Per-instance ZipFile

Changed ParseInput to carry the zip File path instead of the shared ZipFile handle. Each parseEntry() call now opens its own ZipFile.

Why? A single ZipFile instance shared across parallel threads means concurrent calls to getInputStream() share underlying native inflater state, which is not thread-safe. Each thread has its own ZipFile, eliminating the need for sharing entirely.

Per-instance DFA arrays

After constructing the PlSqlLexer and PlSqlParser, we immediately replace their ATNSimulator with a fresh one backed by a newly allocated DFA[] and a new PredictionContextCache.

Why? ANTLR's generated classes store their DFA[] in a static field shared across all instances. Concurrent threads racing to build DFA state can corrupt the shared PredictionContext graph — creating cyclic parent-pointer chains (A→B→A) that cause equals() to recurse infinitely. With a private DFA[] per parse, all mutable prediction states are thread-local, and there is nothing to corrupt.

SLL-first prediction mode with LL fallback

Before calling sql_script(), set PredictionMode.SLL. If the parse reports syntax errors, reset the token stream and re-parse with PredictionMode.LL.

Why? This was the root cause of the hang. ANTLR's full LL mode tracks complete calling contexts using PredictionContext objects linked together in a graph via parent pointers. For deeply recursive grammar rules (common in PL/SQL), PredictionContext.merge() can create cycles in that graph even within a single-threaded parse. Any subsequent HashMap.get() that calls equals() on those cyclic contexts then recurses forever. SLL mode uses a context-free prediction algorithm that never builds these deep context graphs, so the cycle can never form. The LL fallback is only triggered for genuinely ambiguous grammar points, and in practice, those parsed correctly and terminated within the observed ~15-17 seconds.

Watchdog thread (diagnostic, output commented out)

A daemon thread that wakes after 15 seconds and dumps all thread stack traces every 5 seconds. This works at least on my slow-ish system.

Why? Code used to identify the root cause of the hang. The thread dump showed ForkJoinPool.commonPool-worker-2 burning CPU in an infinite SingletonPredictionContext.equals() recursion deep inside PredictionContext.mergeSingletons(), which pointed directly to the cyclic context graph problem. The output is commented out, but the thread remains, so it can be re-enabled if a future hang needs diagnosing.

1 reply

davdres Mar 12, 2026
Author

Thanks, so much! This is invaluable.

Loop in closure but only when the second parse invocation. #4923

Uh oh!

davdres Mar 4, 2026

Replies: 5 comments · 4 replies

Uh oh!

Uh oh!

davdres Mar 4, 2026 Author

Uh oh!

kaby76 Mar 4, 2026

Uh oh!

davdres Mar 4, 2026 Author

Uh oh!

davdres Mar 10, 2026 Author

Uh oh!

Uh oh!

kaby76 Mar 11, 2026

Uh oh!

davdres Mar 11, 2026 Author

Uh oh!

kaby76 Mar 11, 2026

Uh oh!

Uh oh!

kaby76 Mar 11, 2026

Uh oh!

davdres Mar 12, 2026 Author

davdres
Mar 4, 2026

Replies: 5 comments 4 replies

davdres
Mar 4, 2026
Author

davdres
Mar 4, 2026
Author

davdres
Mar 10, 2026
Author

kaby76
Mar 11, 2026

davdres Mar 11, 2026
Author

kaby76
Mar 11, 2026

davdres Mar 12, 2026
Author