`\w` in Python does not conform to Unicode

### Describe the bug

Compiling the Pomsky expression `[word]` targeting the Python flavor produces `\w`. But [Python's `\w`](https://docs.python.org/3/library/re.html#index-34) doesn't match the Unicode spec:

- It matches the `Letter` (`Lm`, `Lt`, `Lu`, `Ll`, `Lo`) general categories, instead of the `Alphabetic` property

- It matches code points with a `Numeric_Type` of `Digit`, `Decimal`, or `Numeric`, but it should match just the `Decimal_Number` (`Nd`) general category.

- It doesn't match the `Mark` (`Mn`, `Mc`, `Me`) general categories, nor `Connector_Punctuation` (`Pc`), except for the underscore `_`.

- It doesn't match characters with the `Join_Control` property (U+200C, U+200D)

### To Reproduce

Run `pomsky -f python '[word]+'`

Run `regex-test -f python '\w+' -t "\u0939\u093f\u0928\u094d\u0926\u0940"`

### Expected behavior

Note that Python's `re` module does not support Unicode properties, so it's impossible to polyfill proper Unicode support.

Therefore, `[word]` should be forbidden in the Python regex flavor, unless Unicode is disabled; then it should produce `[a-zA-Z0-9_]`.

This is not a satisfactory solution, however, since this makes it impossible to match non-ASCII word characters. Some people may find `\w` useful even though it is incorrect and only matches a subset of word characters. That is why another Python flavor should be added, targeting the `regex` module, which has much better Unicode support.

### Alternatives

Add a `nonstandard_unicode` mode, so `\w` can be used in flavors where `\w` matches some non-ASCII word characters, but not all (i.e. Python and .NET)

### Related

python/cpython#44795

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

`\w` in Python does not conform to Unicode #86

Describe the bug

To Reproduce

Expected behavior

Alternatives

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

\w in Python does not conform to Unicode #86

Description

Describe the bug

To Reproduce

Expected behavior

Alternatives

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`\w` in Python does not conform to Unicode #86