How to fix an exponential slowness due to backtracking?

I'm struggling to fix an exponential slowness in `fastparse 3.0.2` (Scala 2.13) used for an expression grammar. There is too much backtracking going on, and I have been adding cuts but the exponential slowness remains.

I tried to add memoization to the parsers but this failed with strange error messages about unboxing `Unit` into `Integer`.

A small working example of a grammar that shows exponential slowness:

```scala
    import fastparse.NoWhitespace._
    import fastparse._
    // Integer calculator program: 1+2*3-(4-5)*6 and so on. No spaces, for simplicity.
    def program[$: P]: P[Int] = P(expr ~ End)
    def expr[$: P]: P[Int]    = P(x_minus | x_plus)
    def x_minus[$: P]         = P(x_times ~ "-" ~ expr)
        .map { case (x, y) => x - y }
    def x_plus[$: P]          = P(x_times ~ ("+" ~ expr).rep)
        .map { case (i, is) => i + is.sum }
    def x_times[$: P]         = P(x_other ~ ("*" ~ x_other).rep)
        .map { case (i, is) => i * is.product }
    def x_other[$: P]         = P(number | ("(" ~ expr ~ ")"))
    def number[$: P]          = P(CharIn("0-9").rep(1))
        .!.map(_.toInt)
    // Verify that this works as expected.
    assert(parse("123*(1+1)", program(_)).get.value == 246)
    assert(parse("123*1+1", program(_)).get.value == 124)
    assert(parse("123*1-1", program(_)).get.value == 122)
    assert(parse("123*(1-1)", program(_)).get.value == 0)

    // Parse an expression of the form `(((((...(1)...)))))`.
     val n = 25
     assert(parse("(" * (n - 1) + "1" + ")" * (n - 1), program(_)).get.value == 1)
```

The code works but parsing takes about 10 seconds. I found that parsing this expression with `n` parentheses takes about $2^{n-20}$ seconds. The reason for the exponential slowness is the backtracking. It tries to parse `expr` after `(` and there are two possibilities (`minus` and `plus`). Each possibility will need to be fully explored before returning to parse the next `(`. To explore means again to parse `expr` recursively. So, there is a 2x increase of parsing work for each parenthesis. To visualize the slowness, consider this function:

```scala
def fibonacci(n: Int): Int = if (n <= 1) 1 else fibonacci(n-1) + fibonacci(n-2)
```
It takes $2^n$ function calls to compute `fibonacci(n)` by this program.

I tried adding cuts to the grammar at various places but the exponential slowness remains.

So, I have two questions:

1) Can we somehow mitigate this problem by adding cuts, or adding more grammar rules, or in some other way?

2) Memoization could help here: if a parser rule already failed at a certain location, it is not necessary to try again to parse with the same parser rule at the same location. If a parser rule succeeded at a certain location, and a parse is unique, it is not necessary to parse again at the same location. This may eliminate the exponential slowness if implemented correctly. Is there a way to implement that?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to fix an exponential slowness due to backtracking? #300

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

How to fix an exponential slowness due to backtracking? #300

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions