-
Notifications
You must be signed in to change notification settings - Fork 92
Open
Description
Per Python's regular expressions documentation here, if we want Python re's to have the same escaping rules as other langs we need to prepend the strings with 'r' for raw. Otherwise, python will escape characters for us during interpretation.
When I generate a parser with the following lex section:
%lex
%%
\s+ return
\"[^\"]*\" return 'STRING'
\d+\.\d+ return 'FLOAT'
\d+ return 'INT'
[\w\-+*=<>/]+ return 'SYMBOL'
/lex
I get the following generated python:
_lex_rules = [['^\(', _lex_rule1],
['^\)', _lex_rule2],
['^\s+', _lex_rule3],
['^"[^\"]*"', _lex_rule4],
['^\d+\.\d+', _lex_rule5],
['^\d+', _lex_rule6],
['^[\w\-+*=<>/]+', _lex_rule7]] These will fail for 'invalid search sequence' or similar, whereas properly prepending the regexes with 'r' manually solves the problem:
_lex_rules = [[r'^\(', _lex_rule1],
[r'^\)', _lex_rule2],
[r'^\s+', _lex_rule3],
[r'^"[^\"]*"', _lex_rule4],
[r'^\d+\.\d+', _lex_rule5],
[r'^\d+', _lex_rule6],
[r'^[\w\-+*=<>/]+', _lex_rule7]] The python generator (likely) needs a simple update to prepend 'r' to regex strings.
Metadata
Metadata
Assignees
Labels
No labels