I'm struggling to fix an exponential slowness in fastparse 3.0.2 (Scala 2.13) used for an expression grammar. There is too much backtracking going on, and I have been adding cuts but the exponential slowness remains.
I tried to add memoization to the parsers but this failed with strange error messages about unboxing Unit into Integer.
A small working example of a grammar that shows exponential slowness:
import fastparse.NoWhitespace._
import fastparse._
// Integer calculator program: 1+2*3-(4-5)*6 and so on. No spaces, for simplicity.
def program[$: P]: P[Int] = P(expr ~ End)
def expr[$: P]: P[Int] = P(x_minus | x_plus)
def x_minus[$: P] = P(x_times ~ "-" ~ expr)
.map { case (x, y) => x - y }
def x_plus[$: P] = P(x_times ~ ("+" ~ expr).rep)
.map { case (i, is) => i + is.sum }
def x_times[$: P] = P(x_other ~ ("*" ~ x_other).rep)
.map { case (i, is) => i * is.product }
def x_other[$: P] = P(number | ("(" ~ expr ~ ")"))
def number[$: P] = P(CharIn("0-9").rep(1))
.!.map(_.toInt)
// Verify that this works as expected.
assert(parse("123*(1+1)", program(_)).get.value == 246)
assert(parse("123*1+1", program(_)).get.value == 124)
assert(parse("123*1-1", program(_)).get.value == 122)
assert(parse("123*(1-1)", program(_)).get.value == 0)
// Parse an expression of the form `(((((...(1)...)))))`.
val n = 25
assert(parse("(" * (n - 1) + "1" + ")" * (n - 1), program(_)).get.value == 1)
The code works but parsing takes about 10 seconds. I found that parsing this expression with n parentheses takes about $2^{n-20}$ seconds. The reason for the exponential slowness is the backtracking. It tries to parse expr after ( and there are two possibilities (minus and plus). Each possibility will need to be fully explored before returning to parse the next (. To explore means again to parse expr recursively. So, there is a 2x increase of parsing work for each parenthesis. To visualize the slowness, consider this function:
def fibonacci(n: Int): Int = if (n <= 1) 1 else fibonacci(n-1) + fibonacci(n-2)
It takes $2^n$ function calls to compute fibonacci(n) by this program.
I tried adding cuts to the grammar at various places but the exponential slowness remains.
So, I have two questions:
-
Can we somehow mitigate this problem by adding cuts, or adding more grammar rules, or in some other way?
-
Memoization could help here: if a parser rule already failed at a certain location, it is not necessary to try again to parse with the same parser rule at the same location. If a parser rule succeeded at a certain location, and a parse is unique, it is not necessary to parse again at the same location. This may eliminate the exponential slowness if implemented correctly. Is there a way to implement that?
I'm struggling to fix an exponential slowness in
fastparse 3.0.2(Scala 2.13) used for an expression grammar. There is too much backtracking going on, and I have been adding cuts but the exponential slowness remains.I tried to add memoization to the parsers but this failed with strange error messages about unboxing
UnitintoInteger.A small working example of a grammar that shows exponential slowness:
The code works but parsing takes about 10 seconds. I found that parsing this expression with$2^{n-20}$ seconds. The reason for the exponential slowness is the backtracking. It tries to parse
nparentheses takes aboutexprafter(and there are two possibilities (minusandplus). Each possibility will need to be fully explored before returning to parse the next(. To explore means again to parseexprrecursively. So, there is a 2x increase of parsing work for each parenthesis. To visualize the slowness, consider this function:It takes$2^n$ function calls to compute
fibonacci(n)by this program.I tried adding cuts to the grammar at various places but the exponential slowness remains.
So, I have two questions:
Can we somehow mitigate this problem by adding cuts, or adding more grammar rules, or in some other way?
Memoization could help here: if a parser rule already failed at a certain location, it is not necessary to try again to parse with the same parser rule at the same location. If a parser rule succeeded at a certain location, and a parse is unique, it is not necessary to parse again at the same location. This may eliminate the exponential slowness if implemented correctly. Is there a way to implement that?