Skip to content

Enable the RegExp u flag#631

Open
markandrus wants to merge 1 commit intokach:masterfrom
markandrus:u-flag
Open

Enable the RegExp u flag#631
markandrus wants to merge 1 commit intokach:masterfrom
markandrus:u-flag

Conversation

@markandrus
Copy link

I had a problem very similar to the one mentioned here in #543. I'm trying to adapt some of the grammars from the ECMAScript standard. For example, here is (part of) the grammar for IdentifierName:

# https://tc39.es/ecma262/#sec-identifier-names

IdentifierName  -> IdentifierStart               {% id %}
                |  IdentifierName IdentifierPart {% xs => xs.join('') %}
IdentifierStart -> IdentifierStartChar           {% id %}
IdentifierPart  -> IdentifierPartChar            {% id %}

IdentifierStartChar -> UnicodeIDStart    {% id %}
                    |  "$"               {% id %}
                    |  "_"               {% id %}
IdentifierPartChar  -> UnicodeIDContinue {% id %}
                    |  "$"               {% id %}
                    |  ZWNJ              {% id %}
                    |  ZWJ               {% id %}

ZWNJ -> "\u200C" {% id %}
ZWJ  -> "\u200D" {% id %}

UnicodeIDStart    -> [\p{ID_Start}]    {% id %}
UnicodeIDContinue -> [\p{ID_Continue}] {% id %}

Crucially, UnicodeIDStart and UnicodeIDContinue are defined in terms of the Unicode properties. We need the \p{ID_Start} and \p{ID_Continue} syntax to work in the RegExp-based charclasses; however, to do that, we also need to enable the u flag.

I'm a very new user of Nearley, so I don't know if it's safe to turn this on for everyone, if it should be opt-in, or if it could cause other problems. What do you think? Is this useful?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant