Skip to content

.NET: \p{LC} doesn't work, . and \w doesn't properly support Unicode #83

Open
@Aloso

Description

@Aloso

All identified problems (most have been addressed in Pomsky 0.10):

  • .NET doesn't support code points (in hexadecimal notation) outside the BMP – must be converted to two UTF-16 surrogates
    • make it work in string literals (e.g. '𐌰')
    • make it work for hexadecimal code points above U+FFFF (e.g. U+10330) instead of producing an error
  • .NET doesn't support arbitrary code points (. or C) outside the BMP #89
  • \pL as shorthand for \p{L} doesn't work
  • \p{LC} doesn't work
    • polyfill?
  • scripts and boolean properties don't work at all
  • needs investigation to see if all blocks are supported
  • check if block names are correctly normalized: underscores must be removed, but dashes preserved
  • \v and \h aren't supported
  • .NET: \w (and by extension \b and \B) don't conform to Unicode #88
  • need to check if backreferences like \80 are too high (doc)
  • any further bugs may surface during fuzzing

To Reproduce

The regex-test crate should be was expanded to run .NET tests and run in CI (currently only on Ubuntu).

Expected behavior

.NET flavor works reliably, using unsupported features produces an error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-compatCompatibility between regex flavorsbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions