-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
From time to time, I peruse the web to find Antlr grammars. The grammar at https://github.com/kaby76/pkl/tree/dbf57280baf68d7f8fb45763715ebd81b406fb6d/pkl-parser/src/test/antlr was replaced a few months ago by a hand-written parser because, among other issues, the grammar was slow and hard to maintain. I ran my tools on the grammar to analyse what is going on here. I whittled the grammar down to a minimum reproducible example.
grammar Exp;
start: (expr ';')* EOF;
expr
: Identifier
| '(' expr ')'?
| expr '.' Identifier
| expr '==' expr
;
Identifier : [a-z]+;
Ws : [ \t\n\r]+ -> skip;
On input a==b.c;, there are two ways for this to be parsed.
$ trparse -a x | trtree -a
CSharp 0 x success 0.0367587
x.d=4.a=1: (start (expr (expr (Identifier "a")) (Identifier "==") (expr (expr (Identifier "b")) (T__4 ".") (Identifier "c"))) (T__1 ";") (EOF ""))
x.d=4.a=2: (start (expr (expr (expr (Identifier "a")) (Identifier "==") (expr (Identifier "b"))) (T__4 ".") (Identifier "c")) (T__1 ";") (EOF ""))
12/13-06:43:20 ~/temp/a/Generated-CSharp
$
Now, compare the grammar to this, an equivalent refactored grammar.
grammar Exp;
start: (expr ';')* EOF;
expr
: Identifier
| '(' expr
| '(' expr ')'
| expr '.' Identifier
| expr '==' expr
;
Identifier : [a-z]+;
Ws : [ \t\n\r]+ -> skip;
This grammar has no ambiguity and runs much faster on very large files (e.g., a==b.c; repeated a few hundred thousand times).
I haven't gone into analysing why the grammar results in ambiguity, and the second does not.
My advice to anyone is to never assume ambiguity (and max-ks) don't exist in your grammar. Always test it thoroughly.