Skip to content

Possible problem when parsing utf content #10

@pragdave

Description

@pragdave

Take this grammar:

require 'rattler'
parser DynagramParser < Rattler::Runtime::ExtendedPackratParser
%whitespace SPACE+
script   <- string EOF
string   <-  @('"' . '"')

Feed it the file

"x"

and you get:

[~/.p/d/ruby] ruby lib/dynagram/bug.rb samples/boxes.dyna
"\"x\""

(which is correct).

Change the input to:

 "∆"

and when you run it, you get:

[~/.p/d/ruby] ruby lib/dynagram/bug.rb samples/boxes.dyna
"\"∆\"\n"

It looks as if somewhere the parser is using the byte length of the match, not the character length. I spent an hour or so poking around, but couldn't see where.

Thanks for a great tool.

@pragdave

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions