Skip to content

tree-sitter-python is overly permissive with newlines #178

Open
@agirardeau

Description

@agirardeau

The following code produces a syntax error in python due to the line break before the colon, but tree-sitter-python parses it as valid code:

def foo(x)
:
    return x + 2

This happens because \s is included in the extras parameter[1], telling tree-sitter to ignore whitespace (and therefore newlines) between any two characters.

Replacing \s by \t in extras causes tree-sitter-python to correctly reject newlines such as the above[2]. However, after doing so it longer escape newlines correctly inside brackets. Consider the following valid python:

a = (
  1 +
  2
)

This fails to parse because tree-sitter does not expect newlines at the end of lines 1 and 2. The scanner.cc logic to ignore line breaks inside bracket expressions depends on close bracket being a valid token[3], which it is not following an open paren or the plus operator.

Is disallowing arbitrary newlines in general while permitting them inside brackets something that is possible to accomplish with tree-sitter?

[1]

/[\s\f\uFEFF\u2060\u200B]|\\\r?\n/

[2] To avoid rejecting all empty lines we'd also have to replace module: $ => repeat($._statement) with something like module: $ => repeat(choice($._statement, /\r?\n/))

[3]

bool within_brackets = valid_symbols[CLOSE_BRACE] || valid_symbols[CLOSE_PAREN] || valid_symbols[CLOSE_BRACKET];

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions