Skip to content

Fortran scanner misses comments in some cases #4454

Open
@mwichmann

Description

@mwichmann

This is to document, in an issue, a limitation of the Fortran Scanner. The scanner itself does describe the limitation in comments, but not why it exists.

In searching for USE and INCLUDE statements when scanning Fortran source, regexes are used, which cannot deal with the combination of multiple semicolon-separated statements on one line which includes comment marks. These lines from the unit test (SCons/Scanner/FortranTests.py) all give the wrong result:

!     USE modia ; use modib  # expect nothing, get modib
      USE mod14 !; USE modi  # expect mod14, get both
      USE mod15!;USE modi  # expect mod15, get both
      USE mod16  !  ;  USE  modi  # expect mod16, get both
; USE modi  # expect nothing (??), get modi

The regex considers the semicolon as the start of a new bit of text to scan, so in each case, what comes before has no effect. The regexes are applied in multiline mode, FWIW. It doesn't matter where the comment mark is or how much whitespace there is, the comment appears to "end" at the semicolon. The fifth example should (apparently) be considered a syntax error (and thus ignored?), but as scanning starts on the blank after the semicolon, there is no complaint.

Python's re module allows only fixed look-behind patterns, so there is no legal way to express "semicolon if not preceded by ! and possibly some other stuff", doing so will produce an error from the re module. There also doesn't seem to be a simple way to express "if the line begins with a comment, don't do anything more with it". It's not hard to write a bit of regex that says ignore from a character until the line ending, but interleaving that with the already fairly complex regex in use is something else.

From some digging around the internet, it seems that it's possible to encode this without a look-behind, at the cost of creating a considerably more complex regex pattern - that might be something to explore; we seem to be lacking that level of expertise. The non-stdlib regex module can reportedly do look-behind with a variable-length pattern; a simple attempt to code it for there gave no error, but didn't help.

Probably we should find a place to document this and suggest that the easiest workaround, if it causes problems (it's not clear there's a real-world problem here, just a broken test when a change was made in the scanner module), is to "not do that" - so instead of:

      USE mod14 !; USE modi

do:

      USE mod14
!     USE modi

One suggestion from Discord was to pre-process the file to scan to remove comments, not sure how easy this is. Can comment marks appear in a line inside some other construct such that they are not considered comment indicators?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions