Skip to content

Decode paired surrogates in unicode escapes #657

Open
@SimonSapin

Description

@SimonSapin

Split off from #608

The draft GraphQL spec adds a new feature:

https://spec.graphql.org/draft/#sec-String-Value.Escape-Sequences

For legacy reasons, a supplementary character may be escaped by two fixed-width unicode escape sequences forming a surrogate pair. For example the input "\uD83D\uDCA9" is a valid StringValue which represents the same Unicode text as "\u{1F4A9}". While this legacy form is allowed, it should be avoided as a variable-width unicode escape sequence is a clearer way to encode such code points.

(Variable-width unicode escape sequence mentioned here is another new feature, tracked at #640)

https://spec.graphql.org/draft/#sec-String-Value.Static-Semantics specifies the precise algorithm such as a pair of leading and trailing surrogates are decoded as one char, but surrogates not in such a pair are parse errors.

Until we implement this feature, we’ll fix the panic in #608 by making all escaped surrogates parse errors, whether in a well-formed pair or not.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions