-
Notifications
You must be signed in to change notification settings - Fork 24
Description
Even though #211 fixes #183 on a high level, the strange small offsets remain for unrelated reasons.
The PR uses location non-terminals in the grammar to capture locations at intermediate places using the following code:
Lines 335 to 338 in 7797529
| (* More parsing support functions: line, file, char count, char count for line start *) | |
| let getPosition () : int * string * int * int = | |
| let i = !current in | |
| i.linenum, i.fileName, Lexing.lexeme_start i.lexbuf, i.linestart |
I suspect the use of
Lexing.lexeme_start is wrong here.
For example, in a production SEMICOLON location, in the semantic action of location, which itself matches no lexer tokens, the starting position of the most recently lexed token is returned, i.e. the starting position of SEMICOLON, even though we want the location after it.
I suspect this isn't straightforward to fix by just using the token end location instead, because other times want to use location before something else, so things would instead go wrong there.
Calling Lexing functions in the parser is probably wrong anyway. A proper solution might be to use Menhir, which provides much more powerful position facilities in the parser (avoiding the need for these location rules). That's what Frama-C seem to have done with their CIL as well.