Skip to content

How to fix an exponential slowness due to backtracking? #300

@winitzki

Description

@winitzki

I'm struggling to fix an exponential slowness in fastparse 3.0.2 (Scala 2.13) used for an expression grammar. There is too much backtracking going on, and I have been adding cuts but the exponential slowness remains.

I tried to add memoization to the parsers but this failed with strange error messages about unboxing Unit into Integer.

A small working example of a grammar that shows exponential slowness:

    import fastparse.NoWhitespace._
    import fastparse._
    // Integer calculator program: 1+2*3-(4-5)*6 and so on. No spaces, for simplicity.
    def program[$: P]: P[Int] = P(expr ~ End)
    def expr[$: P]: P[Int]    = P(x_minus | x_plus)
    def x_minus[$: P]         = P(x_times ~ "-" ~ expr)
        .map { case (x, y) => x - y }
    def x_plus[$: P]          = P(x_times ~ ("+" ~ expr).rep)
        .map { case (i, is) => i + is.sum }
    def x_times[$: P]         = P(x_other ~ ("*" ~ x_other).rep)
        .map { case (i, is) => i * is.product }
    def x_other[$: P]         = P(number | ("(" ~ expr ~ ")"))
    def number[$: P]          = P(CharIn("0-9").rep(1))
        .!.map(_.toInt)
    // Verify that this works as expected.
    assert(parse("123*(1+1)", program(_)).get.value == 246)
    assert(parse("123*1+1", program(_)).get.value == 124)
    assert(parse("123*1-1", program(_)).get.value == 122)
    assert(parse("123*(1-1)", program(_)).get.value == 0)

    // Parse an expression of the form `(((((...(1)...)))))`.
     val n = 25
     assert(parse("(" * (n - 1) + "1" + ")" * (n - 1), program(_)).get.value == 1)

The code works but parsing takes about 10 seconds. I found that parsing this expression with n parentheses takes about $2^{n-20}$ seconds. The reason for the exponential slowness is the backtracking. It tries to parse expr after ( and there are two possibilities (minus and plus). Each possibility will need to be fully explored before returning to parse the next (. To explore means again to parse expr recursively. So, there is a 2x increase of parsing work for each parenthesis. To visualize the slowness, consider this function:

def fibonacci(n: Int): Int = if (n <= 1) 1 else fibonacci(n-1) + fibonacci(n-2)

It takes $2^n$ function calls to compute fibonacci(n) by this program.

I tried adding cuts to the grammar at various places but the exponential slowness remains.

So, I have two questions:

  1. Can we somehow mitigate this problem by adding cuts, or adding more grammar rules, or in some other way?

  2. Memoization could help here: if a parser rule already failed at a certain location, it is not necessary to try again to parse with the same parser rule at the same location. If a parser rule succeeded at a certain location, and a parse is unique, it is not necessary to parse again at the same location. This may eliminate the exponential slowness if implemented correctly. Is there a way to implement that?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions