Fortran scanner misses comments in some cases

This is to document, in an issue, a limitation of the Fortran Scanner. The scanner itself does describe the limitation in comments, but not why it exists.

In searching for `USE` and `INCLUDE` statements when scanning Fortran source, regexes are used, which cannot deal with the combination of multiple semicolon-separated statements on one line which includes comment marks.  These lines from the unit test (`SCons/Scanner/FortranTests.py`) all give the wrong result:

```f90
!     USE modia ; use modib  # expect nothing, get modib
      USE mod14 !; USE modi  # expect mod14, get both
      USE mod15!;USE modi  # expect mod15, get both
      USE mod16  !  ;  USE  modi  # expect mod16, get both
; USE modi  # expect nothing (??), get modi
```

The regex considers the semicolon as the start of a new bit of text to scan, so in each case, what comes before has no effect.  The regexes are applied in multiline mode, FWIW. It doesn't matter where the comment mark is or how much whitespace there is, the comment appears to "end" at the semicolon.  The fifth example should (apparently) be considered a syntax error (and thus ignored?), but as scanning starts on the blank after the semicolon, there is no complaint.

Python's `re` module allows only fixed look-behind patterns, so there is no legal way to express "semicolon if not preceded by `!` and possibly some other stuff", doing so will produce an error from the re module. There also doesn't seem to be a simple way to express "if the line begins with a comment, don't do anything more with it". It's not hard to write a bit of regex that says ignore from a character until the line ending, but interleaving that with the already fairly complex regex in use is something else.

From some digging around the internet, it seems that it's possible to encode this without a look-behind, at the cost of creating a considerably more complex regex pattern - that might be something to explore; we seem to be lacking that level of expertise.  The non-stdlib `regex` module can reportedly do look-behind with a variable-length pattern; a simple attempt to code it for there gave no error, but didn't help.

Probably we should find a place to document this and suggest that the easiest workaround, if it causes problems (it's not clear there's a real-world problem here, just a broken test when a change was made in the scanner module), is to "not do that" - so instead of:

```fortran
      USE mod14 !; USE modi
```
do:
```fortran
      USE mod14
!     USE modi
```

One suggestion from Discord was to pre-process the file to scan to remove comments, not sure how easy this is. Can comment marks appear in a line inside some other construct such that they are not considered comment indicators?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fortran scanner misses comments in some cases #4454

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Fortran scanner misses comments in some cases #4454

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions