Skip to content

Regex character classes fail with single scalar, non-NFC range bound elements #750

Open
@kasei

Description

@kasei

Using a non-NFC, single-scalar code point like U+F900 as the start of a character class range causes an error:

1 | let r = #/[\u{F900}-\u{FDCF}]/#
  |           `- error: cannot parse regular expression: invalid bound for character class range

Tested with swift 5.10 and 6.0 (Xcode 16b2 16A5171r):

swift-driver version: 1.90.11.1 Apple Swift version 5.10 (swiftlang-5.10.0.13 clang-1500.3.9.4)
Target: arm64-apple-macosx14.0

swift-driver version: 1.110 Apple Swift version 6.0 (swiftlang-6.0.0.4.52 clang-1600.0.21.1.3)
Target: arm64-apple-macosx14.0

This seems to be because U+F900 is not in NFC, normalizing to U+8C48. I find this surprising, because while this code point is not in NFC, this character class range isn't ambiguous as other non-NFC cases might be (e.g. using a decomposed combination or U+F900 as a literal instead of with the \u escape).

I am trying to port older code that uses NSRegularExpression, and this seems to be a blocker to moving away from the old APIs (short of expanding ranges like this into non-range classes of thousands of individual scalars).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions