Skip to content

Commit 23cdfae

Browse files
committed
Normative: add RegExp.escape (#3382)
1 parent 0b913e9 commit 23cdfae

File tree

1 file changed

+62
-0
lines changed

1 file changed

+62
-0
lines changed

spec.html

+62
Original file line numberDiff line numberDiff line change
@@ -38532,6 +38532,64 @@ <h1>Properties of the RegExp Constructor</h1>
3853238532
<li>has the following properties:</li>
3853338533
</ul>
3853438534

38535+
<emu-clause id="sec-regexp.escape">
38536+
<h1>RegExp.escape ( _S_ )</h1>
38537+
<p>This function returns a copy of _S_ in which characters that are potentially special in a regular expression |Pattern| have been replaced by equivalent escape sequences.</p>
38538+
<p>It performs the following steps when called:</p>
38539+
38540+
<emu-alg>
38541+
1. If _S_ is not a String, throw a *TypeError* exception.
38542+
1. Let _escaped_ be the empty String.
38543+
1. Let _cpList_ be StringToCodePoints(_S_).
38544+
1. For each code point _c_ of _cpList_, do
38545+
1. If _escaped_ is the empty String and _c_ is matched by either |DecimalDigit| or |AsciiLetter|, then
38546+
1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. Escaping a leading ASCII letter does the same for the context after `\c`.
38547+
1. Let _numericValue_ be the numeric value of _c_.
38548+
1. Let _hex_ be Number::toString(𝔽(_numericValue_), 16).
38549+
1. Assert: The length of _hex_ is 2.
38550+
1. Set _escaped_ to the string-concatenation of the code unit 0x005C (REVERSE SOLIDUS), *"x"*, and _hex_.
38551+
1. Else,
38552+
1. Set _escaped_ to the string-concatenation of _escaped_ and EncodeForRegExpEscape(_c_).
38553+
1. Return _escaped_.
38554+
</emu-alg>
38555+
38556+
<emu-note>
38557+
<p>Despite having similar names, EscapeRegExpPattern and `RegExp.escape` do not perform similar actions. The former escapes a pattern for representation as a string, while this function escapes a string for representation inside a pattern.</p>
38558+
</emu-note>
38559+
38560+
<emu-clause id="sec-encodeforregexpescape" type="abstract operation">
38561+
<h1>
38562+
EncodeForRegExpEscape (
38563+
_c_: a code point,
38564+
): a String
38565+
</h1>
38566+
<dl class="header">
38567+
<dt>description</dt>
38568+
<dd>It returns a String representing a |Pattern| for matching _c_. If _c_ is white space or an ASCII punctuator, the returned value is an escape sequence. Otherwise, the returned value is a String representation of _c_ itself.</dd>
38569+
</dl>
38570+
38571+
<emu-alg>
38572+
1. If _c_ is matched by |SyntaxCharacter| or _c_ is U+002F (SOLIDUS), then
38573+
1. Return the string-concatenation of 0x005C (REVERSE SOLIDUS) and UTF16EncodeCodePoint(_c_).
38574+
1. Else if _c_ is a code point listed in the “Code Point” column of <emu-xref href="#table-controlescape-code-point-values"></emu-xref>, then
38575+
1. Return the string-concatenation of 0x005C (REVERSE SOLIDUS) and the string in the “ControlEscape” column of the row whose “Code Point” column contains _c_.
38576+
1. Let _otherPunctuators_ be the string-concatenation of *",-=&lt;>#&amp;!%:;@~'`"* and the code unit 0x0022 (QUOTATION MARK).
38577+
1. Let _toEscape_ be StringToCodePoints(_otherPunctuators_).
38578+
1. If _toEscape_ contains _c_, _c_ is matched by either |WhiteSpace| or |LineTerminator|, or _c_ has the same numeric value as a leading surrogate or trailing surrogate, then
38579+
1. Let _cNum_ be the numeric value of _c_.
38580+
1. If _cNum_ ≤ 0xFF, then
38581+
1. Let _hex_ be Number::toString(𝔽(_cNum_), 16).
38582+
1. Return the string-concatenation of the code unit 0x005C (REVERSE SOLIDUS), *"x"*, and StringPad(_hex_, 2, *"0"*, ~start~).
38583+
1. Let _escaped_ be the empty String.
38584+
1. Let _codeUnits_ be UTF16EncodeCodePoint(_c_).
38585+
1. For each code unit _cu_ of _codeUnits_, do
38586+
1. Set _escaped_ to the string-concatenation of _escaped_ and UnicodeEscape(_cu_).
38587+
1. Return _escaped_.
38588+
1. Return UTF16EncodeCodePoint(_c_).
38589+
</emu-alg>
38590+
</emu-clause>
38591+
</emu-clause>
38592+
3853538593
<emu-clause id="sec-regexp.prototype">
3853638594
<h1>RegExp.prototype</h1>
3853738595
<p>The initial value of `RegExp.prototype` is the RegExp prototype object.</p>
@@ -38863,6 +38921,10 @@ <h1>
3886338921
1. The code points `/` or any |LineTerminator| occurring in the pattern shall be escaped in _S_ as necessary to ensure that the string-concatenation of *"/"*, _S_, *"/"*, and _F_ can be parsed (in an appropriate lexical context) as a |RegularExpressionLiteral| that behaves identically to the constructed regular expression. For example, if _P_ is *"/"*, then _S_ could be *"\\/"* or *"\\u002F"*, among other possibilities, but not *"/"*, because `///` followed by _F_ would be parsed as a |SingleLineComment| rather than a |RegularExpressionLiteral|. If _P_ is the empty String, this specification can be met by letting _S_ be *"(?:)"*.
3886438922
1. Return _S_.
3886538923
</emu-alg>
38924+
38925+
<emu-note>
38926+
<p>Despite having similar names, `RegExp.escape` and EscapeRegExpPattern do not perform similar actions. The former escapes a string for representation inside a pattern, while this function escapes a pattern for representation as a string.</p>
38927+
</emu-note>
3886638928
</emu-clause>
3886738929
</emu-clause>
3886838930

0 commit comments

Comments
 (0)