Skip to content

Misaligned error location when input text contain full-width characters #1530

@johan456789

Description

@johan456789

Describe the bug

When the input text contains full-width characters, the error location indicator (^) will point at the wrong character because it uses half-width spaces (U+20) only but it should instead match input characters' widths and use full-width spaces (U+3000) as well.

To Reproduce

Use any CJK characters or full-width version of latin scripts and intentionally create a grammar error. The indicator (^) will point at the wrong location.

Current and expected behavior shown here:

    raise UnexpectedCharacters(stream, i, text_line, text_column, {item.expect.name for item in to_scan},
lark.exceptions.UnexpectedCharacters: No terminal matches '古' in the current parser context, at line 2 col 6
1.菝葀:古代一種象徵祥瑞的草。《廣韻.入聲.末韻》:「菝:菝葀,瑞草。」
     ^  // using U+20 (current)
     ^  // using U+20 and U+3000 (expected)
Expected one of: 
	* LPAREN
	* I_LQUOTE

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions