-
Notifications
You must be signed in to change notification settings - Fork 13
Open
Description
I'd expect Pika to parse both of the following
a[]()
a()[]
using the following grammar:
program
<- primary;
primary
<- call
/ subscript
/ attribute
/ name;
call
<- call:(primary '(' ')');
subscript
<- subscript:(primary '[' ']');
attribute
<- attribute:(primary '.' name);
name <- name:[A-Za-z0-9_]+;
However, only a[]() works:
Match tree for rule program:
┌─────────┐
10 : primary <- call / subscript / attribute / name │a [ ] ( )│
├─────────┤
6 : call <- call:(primary '(' ')') │a [ ] ( )│
├─────┐ │
10 : primary <- call / subscript / attribute / name │a [ ]│ │
├─────┤ │
7 : subscript <- subscript:(primary '[' ']') │a [ ]│ │
├─┐ │ │
10 : primary <- call / subscript / attribute / name │a│ │ │
├─┤ │ │
8 : name <- name:[0-9A-Z_a-z]+ │a│ │ │
├─┤ │ │
5 : [terminal] [0-9A-Z_a-z] │a│ │ │
│ │ ┌─┤ │
3 : [terminal] ']' │ │ │]│ │
│ ├─┤ │ │
2 : [terminal] '[' │ │[│ │ │
│ │ │ │ ┌─┤
1 : [terminal] ')' │ │ │ │ │)│
│ │ │ ├─┤ │
0 : [terminal] '(' │ │ │ │(│ │
0 1 2 3 4
a [ ] ( )
AST for rule "program":
└─program : 0+5 : "a[]()"
└─call : 0+5 : "a[]()"
└─subscript : 0+3 : "a[]"
└─name : 0+1 : "a"
while a()[] produces a partial match (the trailing [] is not matched):
Match tree for rule program:
┌─────┐░░░
10 : primary <- call / subscript / attribute / name │a ( )│░░░
├─────┤░░░
6 : call <- call:(primary '(' ')') │a ( )│░░░
├─┐ │░░░
10 : primary <- call / subscript / attribute / name │a│ │░░░
├─┤ │░░░
8 : name <- name:[0-9A-Z_a-z]+ │a│ │░░░
├─┤ │░░░
5 : [terminal] [0-9A-Z_a-z] │a│ │░░░
│ │ │░┌─┐
3 : [terminal] ']' │ │ │░│]│
│ │ ├─┤ │
2 : [terminal] '[' │ │ │[│ │
│ │ ┌─┤ │ │
1 : [terminal] ')' │ │ │)│ │ │
│ ├─┤ │ │ │
0 : [terminal] '(' │ │(│ │ │ │
0 1 2 3 4
a ( ) [ ]
AST for rule "program":
└─program : 0+3 : "a()"
└─call : 0+3 : "a()"
└─name : 0+1 : "a"
The Pika library is used as follows:
import java.io.IOException;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Path;
import pikaparser.grammar.Grammar;
import pikaparser.grammar.MetaGrammar;
import pikaparser.parser.utils.ParserInfo;
public class Parser {
static final String TOP_RULE_NAME = "program";
public static void main(String[] args) throws IOException {
final String grammarStr = Files.readString(Path.of(args[0]), Charset.defaultCharset());
final String text = Files.readString(Path.of(args[1]), Charset.defaultCharset());
final var grammar = MetaGrammar.parse(grammarStr);
final var memoTable = grammar.parse(text);
final var topClause = grammar.allClauses.get(grammar.allClauses.size()-1);
System.out.println("Match tree for rule " + TOP_RULE_NAME + ":");
ParserInfo.printParseTreeInMemoTableForm(memoTable);
System.out.println("AST for rule \"" + TOP_RULE_NAME + "\":\n");
ParserInfo.printAST(TOP_RULE_NAME, topClause, memoTable);
}
}
I suppose this has to do with the relative order of call and subscript choices in primary.
Is this as intended? If yes, how should the grammar be rewritten to handle both a[]() and a()[] ?
For comparison, pegen, CPython's PEG parser with support for left-recursion parses both. I'm including a minimal dockerfile that reproduces pegen results:
# syntax=docker/dockerfile:1
# save this to pegen.Dockerfile then run
# $ docker buildx build -f pegen.Dockerfile .
FROM python:3.12-slim
RUN pip install pegen
COPY <<GRAMMAR grammar.pegen
start:
| p=primary NEWLINE? ENDMARKER { ['start', p] }
primary:
| call
| subscript
| attribute
| name_
call:
| p=primary '(' ')' { ['call', p] }
subscript:
| p=primary '[' ']' { ['subscript', p] }
attribute:
| p=primary '.' n=name_ { ['attribute', p, n] }
name_:
| n=NAME { ['name', n.string] }
GRAMMAR
RUN python3 <<PEGEN
import pegen.utils
parser = pegen.utils.make_parser(open('grammar.pegen').read())
def parse(text):
tree = pegen.utils.parse_string(text, parser)
print(f'{text=} {tree=}')
parse('a[]()')
parse('a()[]')
exit(1) # make this script fail so that docker shows its output
PEGEN
This outputs:
text='a[]()' tree=['start', ['call', ['subscript', ['name', 'a']]]]
text='a()[]' tree=['start', ['subscript', ['call', ['name', 'a']]]]
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels