-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Some unexpected behavior was discovered when experimenting with layout on an ableC extension for prolog-style logic programming. The following is a somewhat simplified grammar that seems to exhibit the same issue.
Grammar "host":
ignore terminal WhiteSpace_t /[\t\r\n ]+/;
terminal Plus_t '+' association=left, precedence=0;
terminal Mod_t '%' association=left, precedence=1;
terminal LParen_t '(';
terminal RParen_t ')';
terminal LCurly_t '{';
terminal RCurly_t '}';
terminal Semi_t ';';
terminal Id_t /[a-zA-Z]+/;
nonterminal Stmt;
concrete productions top::Stmt
| Expr ';' {}
| '{' Stmt '}' {}
nonterminal Expr;
concrete productions top::Expr
| '(' Expr ')' {}
| Expr '+' Expr {}
| Expr '%' Expr {}
| Id_c '(' ')' {}
| Id_c {}
nonterminal Id_c;
concrete productions top::Id_c
| Id_t {}
Grammar "ext":
terminal ExtComment_t /% .*/;
marking terminal Ext_t 'ext' dominates Id_t;
terminal Dot_t '.';
concrete production extProd
top::Stmt ::= 'ext' '{' id::Id_c '(' ')' '.' '}'
layout { ExtComment_t }
{}
Using a parser built only from "host" the string
{
a % b;
}
parses successfully. However using a parser containing both host and ext (generated copper spec for reference: Parser_copper_features_test_layout_lookahead_parse_ext.copper) for the same string, the following parse error results:
Error at line 3, column 0 in file
(parser state: 3; real character index: 12):
Expected a token of one of the following types:
[copper_features:test_layout:lookahead:ext:ExtComment_t,
'(',
'%',
'+',
')',
';',
copper_features:test_layout:lookahead:host:WhiteSpace_t]
Input currently matches:
['}']
This is a rather unexpected result because the introduction of an extension causes unrelated, existing code to suddenly break, without any sort of lexical ambiguity being raised. If I correctly understand what is going on here, this is happens because of DFA states differing only in lookahead being merged, resulting in the layout terminal dominating due to maximal munch?
This behavior seems rather undesirable, and at least should emit some sort of warning. Or would it even be possible to modify the LALR(1) parser construction algorithm to not merge states that have different layout?