Skip to content

Commit e2da759

Browse files
committed
Normative: add RegExp.escape (#3382)
1 parent ed75310 commit e2da759

File tree

1 file changed

+62
-0
lines changed

1 file changed

+62
-0
lines changed

spec.html

+62
Original file line numberDiff line numberDiff line change
@@ -38495,6 +38495,64 @@ <h1>Properties of the RegExp Constructor</h1>
3849538495
<li>has the following properties:</li>
3849638496
</ul>
3849738497

38498+
<emu-clause id="sec-regexp.escape">
38499+
<h1>RegExp.escape ( _S_ )</h1>
38500+
<p>This function returns a copy of _S_ in which characters that are potentially special in a regular expression |Pattern| have been replaced by equivalent escape sequences.</p>
38501+
<p>It performs the following steps when called:</p>
38502+
38503+
<emu-alg>
38504+
1. If _S_ is not a String, throw a *TypeError* exception.
38505+
1. Let _escaped_ be the empty String.
38506+
1. Let _cpList_ be StringToCodePoints(_S_).
38507+
1. For each code point _cp_ of _cpList_, do
38508+
1. If _escaped_ is the empty String and _cp_ is matched by either |DecimalDigit| or |AsciiLetter|, then
38509+
1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. Escaping a leading ASCII letter does the same for the context after `\c`.
38510+
1. Let _numericValue_ be the numeric value of _cp_.
38511+
1. Let _hex_ be Number::toString(𝔽(_numericValue_), 16).
38512+
1. Assert: The length of _hex_ is 2.
38513+
1. Set _escaped_ to the string-concatenation of the code unit 0x005C (REVERSE SOLIDUS), *"x"*, and _hex_.
38514+
1. Else,
38515+
1. Set _escaped_ to the string-concatenation of _escaped_ and EncodeForRegExpEscape(_cp_).
38516+
1. Return _escaped_.
38517+
</emu-alg>
38518+
38519+
<emu-note>
38520+
<p>Despite having similar names, EscapeRegExpPattern and `RegExp.escape` do not perform similar actions. The former escapes a pattern for representation as a string, while this function escapes a string for representation inside a pattern.</p>
38521+
</emu-note>
38522+
38523+
<emu-clause id="sec-encodeforregexpescape" type="abstract operation">
38524+
<h1>
38525+
EncodeForRegExpEscape (
38526+
_cp_: a code point,
38527+
): a String
38528+
</h1>
38529+
<dl class="header">
38530+
<dt>description</dt>
38531+
<dd>It returns a String representing a |Pattern| for matching _c_. If _c_ is white space or an ASCII punctuator, the returned value is an escape sequence. Otherwise, the returned value is a String representation of _c_ itself.</dd>
38532+
</dl>
38533+
38534+
<emu-alg>
38535+
1. If _cp_ is matched by |SyntaxCharacter| or _cp_ is U+002F (SOLIDUS), then
38536+
1. Return the string-concatenation of 0x005C (REVERSE SOLIDUS) and UTF16EncodeCodePoint(_cp_).
38537+
1. Else if _cp_ is a code point listed in the “Code Point” column of <emu-xref href="#table-controlescape-code-point-values"></emu-xref>, then
38538+
1. Return the string-concatenation of 0x005C (REVERSE SOLIDUS) and the string in the “ControlEscape” column of the row whose “Code Point” column contains _c_.
38539+
1. Let _otherPunctuators_ be the string-concatenation of *",-=&lt;>#&amp;!%:;@~'`"* and the code unit 0x0022 (QUOTATION MARK).
38540+
1. Let _toEscape_ be StringToCodePoints(_otherPunctuators_).
38541+
1. If _toEscape_ contains _cp_, _cp_ is matched by either |WhiteSpace| or |LineTerminator|, or _cp_ has the same numeric value as a leading surrogate or trailing surrogate, then
38542+
1. Let _cpNum_ be the numeric value of _cp_.
38543+
1. If _cpNum_ ≤ 0xFF, then
38544+
1. Let _hex_ be Number::toString(𝔽(_cpNum_), 16).
38545+
1. Return the string-concatenation of the code unit 0x005C (REVERSE SOLIDUS), *"x"*, and StringPad(_hex_, 2, *"0"*, ~start~).
38546+
1. Let _escaped_ be the empty String.
38547+
1. Let _codeUnits_ be UTF16EncodeCodePoint(_cp_).
38548+
1. For each code unit _cu_ of _codeUnits_, do
38549+
1. Set _escaped_ to the string-concatenation of _escaped_ and UnicodeEscape(_cu_).
38550+
1. Return _escaped_.
38551+
1. Return UTF16EncodeCodePoint(_cp_).
38552+
</emu-alg>
38553+
</emu-clause>
38554+
</emu-clause>
38555+
3849838556
<emu-clause id="sec-regexp.prototype">
3849938557
<h1>RegExp.prototype</h1>
3850038558
<p>The initial value of `RegExp.prototype` is the RegExp prototype object.</p>
@@ -38826,6 +38884,10 @@ <h1>
3882638884
1. The code points `/` or any |LineTerminator| occurring in the pattern shall be escaped in _S_ as necessary to ensure that the string-concatenation of *"/"*, _S_, *"/"*, and _F_ can be parsed (in an appropriate lexical context) as a |RegularExpressionLiteral| that behaves identically to the constructed regular expression. For example, if _P_ is *"/"*, then _S_ could be *"\\/"* or *"\\u002F"*, among other possibilities, but not *"/"*, because `///` followed by _F_ would be parsed as a |SingleLineComment| rather than a |RegularExpressionLiteral|. If _P_ is the empty String, this specification can be met by letting _S_ be *"(?:)"*.
3882738885
1. Return _S_.
3882838886
</emu-alg>
38887+
38888+
<emu-note>
38889+
<p>Despite having similar names, `RegExp.escape` and EscapeRegExpPattern do not perform similar actions. The former escapes a string for representation inside a pattern, while this function escapes a pattern for representation as a string.</p>
38890+
</emu-note>
3882938891
</emu-clause>
3883038892
</emu-clause>
3883138893

0 commit comments

Comments
 (0)