Skip to content

Locations are offset due to lexer misuse in parser actions #213

@sim642

Description

@sim642

Even though #211 fixes #183 on a high level, the strange small offsets remain for unrelated reasons.

The PR uses location non-terminals in the grammar to capture locations at intermediate places using the following code:

(* More parsing support functions: line, file, char count, char count for line start *)
let getPosition () : int * string * int * int =
let i = !current in
i.linenum, i.fileName, Lexing.lexeme_start i.lexbuf, i.linestart

I suspect the use of Lexing.lexeme_start is wrong here.

For example, in a production SEMICOLON location, in the semantic action of location, which itself matches no lexer tokens, the starting position of the most recently lexed token is returned, i.e. the starting position of SEMICOLON, even though we want the location after it.

I suspect this isn't straightforward to fix by just using the token end location instead, because other times want to use location before something else, so things would instead go wrong there.

Calling Lexing functions in the parser is probably wrong anyway. A proper solution might be to use Menhir, which provides much more powerful position facilities in the parser (avoiding the need for these location rules). That's what Frama-C seem to have done with their CIL as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions