Skip to content

Python generator doesn't use raw strings #129

@gankoji

Description

@gankoji

Per Python's regular expressions documentation here, if we want Python re's to have the same escaping rules as other langs we need to prepend the strings with 'r' for raw. Otherwise, python will escape characters for us during interpretation.

When I generate a parser with the following lex section:

%lex
 
%%
 
\s+             return
 
\"[^\"]*\"      return 'STRING'
 
\d+\.\d+        return 'FLOAT'
 
\d+             return 'INT'
 
[\w\-+*=<>/]+   return 'SYMBOL'
/lex

I get the following generated python:

_lex_rules = [['^\(', _lex_rule1],
['^\)', _lex_rule2],
['^\s+', _lex_rule3],
['^"[^\"]*"', _lex_rule4],
['^\d+\.\d+', _lex_rule5],
['^\d+', _lex_rule6],                      
['^[\w\-+*=<>/]+', _lex_rule7]] 

These will fail for 'invalid search sequence' or similar, whereas properly prepending the regexes with 'r' manually solves the problem:

_lex_rules = [[r'^\(', _lex_rule1],
[r'^\)', _lex_rule2],
[r'^\s+', _lex_rule3],
[r'^"[^\"]*"', _lex_rule4],
[r'^\d+\.\d+', _lex_rule5],
[r'^\d+', _lex_rule6],                      
[r'^[\w\-+*=<>/]+', _lex_rule7]] 

The python generator (likely) needs a simple update to prepend 'r' to regex strings.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions