Skip to content

make_tokenizer doesn't deal with binary tokenizers #46

@gsnedders

Description

@gsnedders
tokenize = make_tokenizer([
    (u'x', (br'\xff\n',)),
])

tokens = list(tokenize(b"\xff\n"))

throws

  File "/Users/gsnedders/Documents/other-projects/funcparserlib/funcparserlib/funcparserlib/tests/test_parsing.py", line 76, in test_tokenize_bytes
    tokens = list(tokenize(b"\xff\n"))
  File "/Users/gsnedders/Documents/other-projects/funcparserlib/funcparserlib/funcparserlib/lexer.py", line 107, in f
    t = match_specs(compiled, str, i, (line, pos))
  File "/Users/gsnedders/Documents/other-projects/funcparserlib/funcparserlib/funcparserlib/lexer.py", line 91, in match_specs
    nls = value.count(u'\n')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)

match_specs needs to handle unicode and bytes line feed characters.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions