Skip to content

.NET: \w (and by extension \b and \B) don't conform to Unicode #88

Open
@Aloso

Description

@Aloso

\w is equivalent to [\p{L}\p{Mn}\p{Nd}\p{Pc}] in .NET instead of [\p{Alpha}\p{M}\p{Nd}\p{Pc}\p{Join_Control}]:

  1. It incorrectly uses GC=Letter instead of Alphabetic=Yes; the latter includes more code points!
  2. It doesn't match all of GC=Mark, only GC=Nonspacing_Mark
  3. It doesn't match Join_Control=Yes

AFAIK there's nothing we can do other than emitting a warning: \p{Alpha} doesn't work in .NET, so we can't polyfill it. But a warning adds noise and doesn't help much when there isn't a straightforward fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-compatCompatibility between regex flavorsbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions