-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Hi. While working on a C# implementation of this algorithm, I ran into an interesting case which caused the parser to run into an infinite loop. To verify that it wasn't just a minor error of mine I also attempted to recreate the same grammar in the reference implementation and ran into the same problem.
The specific case:
List<Rule> rules = Arrays.asList(
new Rule("A",
new First(
new CharSet('a'),
new Nothing()
)
),
new Rule("CHAIN",
new Seq(
new RuleRef("CHAIN"),
new RuleRef("A")
)
)
);
var grammar = new Grammar(rules);
var mt = grammar.parse("a");
System.out.println("done");If I'm not mistaken, this runs into an infinite loop because the CHAIN rule may match an empty string, but is also its own (indirect) seed parent, causing it to always get added to the priority queue (which will therefore never be empty). This gives a debug trace of:
...
Matched: CHAIN <- CHAIN A : 0+1
Following seed parent clause: CHAIN <- CHAIN A
Matched: CHAIN <- CHAIN A : 0+1
Following seed parent clause: CHAIN <- CHAIN A
Matched: CHAIN <- CHAIN A : 0+1
Following seed parent clause: CHAIN <- CHAIN A
Matched: CHAIN <- CHAIN A : 0+1
Following seed parent clause: CHAIN <- CHAIN A
...
Of course, this grammar can be easily rewritten in order to not be left-recursive, but from what I understand the algorithm is supposed to be able to handle these cases. I was wondering if you had any suggestions on how the implementation could be modified in order to be able to handle these cases? My initial thoughts are to simply check if we already attempted to match a clause at a specific position in the input and if so, do not add the clause to the priority queue. However, I'm not sure if this would have other unforeseen side effects on the workings of the algorithm.