Skip to content

Detect lookbehind with a non-constant length or unbounded repetition #69

Open
@Aloso

Description

@Aloso

Java, Python, and PCRE all restrict what is allowed to appear in a lookbehind assertion. These restrictions are similar, but with some differences.

  • In Python, a lookbehind must have a fixed length and may only contain repetitions like {n}. \X is forbidden in a lookbehind.

  • In PCRE, a lookbehind may contain alternatives with different lengths, which can even be nested, as well as repetitions with an upper bound. As in Python, \X and unbounded repetitions are forbidden

  • Java has the most complicated rules. It allows arbitrary repetition in a lookbehind; however, every repetition must have a fixed length, and may not include alternations or ? even when they have the same length. ? / {0,1} is not treated as a repetition by Java.

    Java treats repeated groups ((a)+) differently than other kinds of repetitions (a+, [a]+). Infinite repetitions of groups must satisfy the following constraints, where $n$ is the number of code points matched by the group:

    • when $n$ is odd:

      • it must be preceded by an odd number $r$ of infinite repetitions, or
      • the part before the repeated group must match $c$ code points (excluding infinite repetitions), so that $c - r < n$
    • when $n$ is even:

      • it must be preceded by an odd number $r$ of infinite repetitions, and
      • the part before the repeated group must match $c$ code points (excluding infinite repetitions), so that $c - r \in [0; n)$

    These rules might not be comprehensive, but I haven't found an exception so far.

Describe the bug

Pomsky allows anything in a lookbehind.

Expected behavior

A compatibility error should be produced.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinggood first issueGood for newcomers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions