Skip to content
This repository was archived by the owner on Apr 1, 2025. It is now read-only.
This repository was archived by the owner on Apr 1, 2025. It is now read-only.

Support for UTF-16 surrogate pair encoded emojis #279

@tech4him1

Description

@tech4him1

Currently, if I try to parse YAML data containing Unicode emojis split into UTF-16 surrogate pairs (i.e. 1F468 as \uD83D\uDC68 in YAML), go-yaml returns the error "found invalid Unicode character escape code".

According to the YAML spec parsers are supposed to support UTF-8 and UTF-16, including surrogate pairs:
http://www.yaml.org/spec/1.2/spec.html#id2770814
http://www.yaml.org/spec/1.2/spec.html#id2771184

This looks intentional, are you planning on supporting these, or not?

yaml/scannerc.go

Lines 2443 to 2447 in 25c4ec8

if (value >= 0xD800 && value <= 0xDFFF) || value > 0x10FFFF {
yaml_parser_set_scanner_error(parser, "while parsing a quoted scalar",
start_mark, "found invalid Unicode character escape code")
return false
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions