simple PEG grammar not working as expected


I'd expect Pika to parse both of the following

`a[]()`
`a()[]`

using the following grammar:

```
program
  <- primary;

primary
  <- call
  /  subscript
  /  attribute
  /  name;

call
  <- call:(primary '(' ')');
  
subscript
  <- subscript:(primary '[' ']');

attribute
  <- attribute:(primary '.' name);

name <- name:[A-Za-z0-9_]+;
```

However, only `a[]()` works:

```
Match tree for rule program:
                                                      ┌─────────┐  
 10 : primary <- call / subscript / attribute / name  │a [ ] ( )│  
                                                      ├─────────┤  
                  6 : call <- call:(primary '(' ')')  │a [ ] ( )│  
                                                      ├─────┐   │  
 10 : primary <- call / subscript / attribute / name  │a [ ]│   │  
                                                      ├─────┤   │  
        7 : subscript <- subscript:(primary '[' ']')  │a [ ]│   │  
                                                      ├─┐   │   │  
 10 : primary <- call / subscript / attribute / name  │a│   │   │  
                                                      ├─┤   │   │  
                      8 : name <- name:[0-9A-Z_a-z]+  │a│   │   │  
                                                      ├─┤   │   │  
                         5 : [terminal] [0-9A-Z_a-z]  │a│   │   │  
                                                      │ │ ┌─┤   │  
                                  3 : [terminal] ']'  │ │ │]│   │  
                                                      │ ├─┤ │   │  
                                  2 : [terminal] '['  │ │[│ │   │  
                                                      │ │ │ │ ┌─┤  
                                  1 : [terminal] ')'  │ │ │ │ │)│  
                                                      │ │ │ ├─┤ │  
                                  0 : [terminal] '('  │ │ │ │(│ │  
                                                       0 1 2 3 4 
                                                       a [ ] ( ) 
AST for rule "program":

└─program : 0+5 : "a[]()"
  └─call : 0+5 : "a[]()"
    └─subscript : 0+3 : "a[]"
      └─name : 0+1 : "a"
```

while `a()[]` produces a partial match (the trailing `[]` is not matched):

```
Match tree for rule program:
                                                      ┌─────┐░░░   
 10 : primary <- call / subscript / attribute / name  │a ( )│░░░   
                                                      ├─────┤░░░   
                  6 : call <- call:(primary '(' ')')  │a ( )│░░░   
                                                      ├─┐   │░░░   
 10 : primary <- call / subscript / attribute / name  │a│   │░░░   
                                                      ├─┤   │░░░   
                      8 : name <- name:[0-9A-Z_a-z]+  │a│   │░░░   
                                                      ├─┤   │░░░   
                         5 : [terminal] [0-9A-Z_a-z]  │a│   │░░░   
                                                      │ │   │░┌─┐  
                                  3 : [terminal] ']'  │ │   │░│]│  
                                                      │ │   ├─┤ │  
                                  2 : [terminal] '['  │ │   │[│ │  
                                                      │ │ ┌─┤ │ │  
                                  1 : [terminal] ')'  │ │ │)│ │ │  
                                                      │ ├─┤ │ │ │  
                                  0 : [terminal] '('  │ │(│ │ │ │  
                                                       0 1 2 3 4 
                                                       a ( ) [ ] 
AST for rule "program":

└─program : 0+3 : "a()"
  └─call : 0+3 : "a()"
    └─name : 0+1 : "a"
```

The Pika library is used as follows:

```
import java.io.IOException;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Path;

import pikaparser.grammar.Grammar;
import pikaparser.grammar.MetaGrammar;
import pikaparser.parser.utils.ParserInfo;

public class Parser {

  static final String TOP_RULE_NAME = "program";

    public static void main(String[] args) throws IOException {
        final String grammarStr = Files.readString(Path.of(args[0]), Charset.defaultCharset());
        final String text = Files.readString(Path.of(args[1]), Charset.defaultCharset());

        final var grammar = MetaGrammar.parse(grammarStr);
        final var memoTable = grammar.parse(text);
        final var topClause = grammar.allClauses.get(grammar.allClauses.size()-1);
        
        System.out.println("Match tree for rule " + TOP_RULE_NAME + ":");
        ParserInfo.printParseTreeInMemoTableForm(memoTable);
        System.out.println("AST for rule \"" + TOP_RULE_NAME + "\":\n");
        ParserInfo.printAST(TOP_RULE_NAME, topClause, memoTable);
    }
}
```

I suppose this has to do with the relative order of `call` and `subscript` choices in `primary`.

Is this as intended? If yes, how should the grammar be rewritten to handle both `a[]()` and `a()[]` ?


For comparison, [pegen](https://github.com/we-like-parsers/pegen), CPython's PEG parser with [support for left-recursion](https://daobook.github.io/devguide/parser.html#left-recursion) parses both. I'm including a minimal dockerfile that reproduces pegen results:

```
# syntax=docker/dockerfile:1

# save this to pegen.Dockerfile then run 
# $ docker buildx build -f pegen.Dockerfile .

FROM python:3.12-slim


RUN pip install pegen


COPY <<GRAMMAR grammar.pegen

start: 
  | p=primary NEWLINE? ENDMARKER { ['start', p] } 

primary:
  | call
  | subscript
  | attribute
  | name_

call:
  | p=primary '(' ')' { ['call', p] } 

subscript:
  | p=primary '[' ']' { ['subscript', p] } 

attribute:
  | p=primary '.' n=name_ { ['attribute', p, n] } 

name_:
  | n=NAME { ['name', n.string] } 

GRAMMAR


RUN python3 <<PEGEN

import pegen.utils
parser = pegen.utils.make_parser(open('grammar.pegen').read())
def parse(text):
  tree = pegen.utils.parse_string(text, parser)
  print(f'{text=} {tree=}')
parse('a[]()')
parse('a()[]')
exit(1)  # make this script fail so that docker shows its output

PEGEN
```
This outputs:

```
text='a[]()' tree=['start', ['call', ['subscript', ['name', 'a']]]]
text='a()[]' tree=['start', ['subscript', ['call', ['name', 'a']]]]
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simple PEG grammar not working as expected #36

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

simple PEG grammar not working as expected #36

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions