forked from tc39/ecma402
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcollator.html
396 lines (343 loc) · 21.8 KB
/
collator.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
<emu-clause id="collator-objects">
<h1>Collator Objects</h1>
<emu-clause id="sec-the-intl-collator-constructor">
<h1>The Intl.Collator Constructor</h1>
<p>
The Intl.Collator constructor is the <dfn>%Intl.Collator%</dfn> intrinsic object and a standard built-in property of the Intl object. Behaviour common to all service constructor properties of the Intl object is specified in <emu-xref href="#sec-internal-slots"></emu-xref>.
</p>
<emu-clause id="sec-intl.collator" oldids="sec-initializecollator">
<h1>Intl.Collator ( [ _locales_ [ , _options_ ] ] )</h1>
<p>
When the `Intl.Collator` function is called with optional arguments _locales_ and _options_, the following steps are taken:
</p>
<emu-alg>
1. If NewTarget is *undefined*, let _newTarget_ be the active function object, else let _newTarget_ be NewTarget.
1. Let _internalSlotsList_ be « [[InitializedCollator]], [[Locale]], [[Usage]], [[Sensitivity]], [[IgnorePunctuation]], [[Collation]], [[BoundCompare]] ».
1. If %Intl.Collator%.[[RelevantExtensionKeys]] contains *"kn"*, then
1. Append [[Numeric]] to _internalSlotsList_.
1. If %Intl.Collator%.[[RelevantExtensionKeys]] contains *"kf"*, then
1. Append [[CaseFirst]] to _internalSlotsList_.
1. Let _collator_ be ? OrdinaryCreateFromConstructor(_newTarget_, *"%Intl.Collator.prototype%"*, _internalSlotsList_).
1. Let _requestedLocales_ be ? CanonicalizeLocaleList(_locales_).
1. Set _options_ to ? CoerceOptionsToObject(_options_).
1. Let _usage_ be ? GetOption(_options_, *"usage"*, ~string~, « *"sort"*, *"search"* », *"sort"*).
1. Set _collator_.[[Usage]] to _usage_.
1. If _usage_ is *"sort"*, then
1. Let _localeData_ be %Intl.Collator%.[[SortLocaleData]].
1. Else,
1. Let _localeData_ be %Intl.Collator%.[[SearchLocaleData]].
1. Let _opt_ be a new Record.
1. Let _matcher_ be ? GetOption(_options_, *"localeMatcher"*, ~string~, « *"lookup"*, *"best fit"* », *"best fit"*).
1. Set _opt_.[[localeMatcher]] to _matcher_.
1. Let _collation_ be ? GetOption(_options_, *"collation"*, ~string~, ~empty~, *undefined*).
1. If _collation_ is not *undefined*, then
1. If _collation_ cannot be matched by the <code>type</code> Unicode locale nonterminal, throw a *RangeError* exception.
1. Set _opt_.[[co]] to _collation_.
1. Let _numeric_ be ? GetOption(_options_, *"numeric"*, ~boolean~, ~empty~, *undefined*).
1. If _numeric_ is not *undefined*, then
1. Set _numeric_ to ! ToString(_numeric_).
1. Set _opt_.[[kn]] to _numeric_.
1. Let _caseFirst_ be ? GetOption(_options_, *"caseFirst"*, ~string~, « *"upper"*, *"lower"*, *"false"* », *undefined*).
1. Set _opt_.[[kf]] to _caseFirst_.
1. Let _relevantExtensionKeys_ be %Intl.Collator%.[[RelevantExtensionKeys]].
1. Let _r_ be ResolveLocale(%Intl.Collator%.[[AvailableLocales]], _requestedLocales_, _opt_, _relevantExtensionKeys_, _localeData_).
1. Set _collator_.[[Locale]] to _r_.[[Locale]].
1. Set _collation_ to _r_.[[co]].
1. If _collation_ is *null*, set _collation_ to *"default"*.
1. Set _collator_.[[Collation]] to _collation_.
1. If _relevantExtensionKeys_ contains *"kn"*, then
1. Set _collator_.[[Numeric]] to SameValue(_r_.[[kn]], *"true"*).
1. If _relevantExtensionKeys_ contains *"kf"*, then
1. Set _collator_.[[CaseFirst]] to _r_.[[kf]].
1. Let _resolvedLocaleData_ be _r_.[[LocaleData]].
1. Let _sensitivity_ be ? GetOption(_options_, *"sensitivity"*, ~string~, « *"base"*, *"accent"*, *"case"*, *"variant"* », *undefined*).
1. If _sensitivity_ is *undefined*, then
1. If _usage_ is *"sort"*, then
1. Set _sensitivity_ to *"variant"*.
1. Else,
1. Set _sensitivity_ to _resolvedLocaleData_.[[sensitivity]].
1. Set _collator_.[[Sensitivity]] to _sensitivity_.
1. Let _defaultIgnorePunctuation_ be _resolvedLocaleData_.[[ignorePunctuation]].
1. Let _ignorePunctuation_ be ? GetOption(_options_, *"ignorePunctuation"*, ~boolean~, ~empty~, _defaultIgnorePunctuation_).
1. Set _collator_.[[IgnorePunctuation]] to _ignorePunctuation_.
1. Return _collator_.
</emu-alg>
</emu-clause>
<emu-note>
The collation associated with the *"search"* value for the *"usage"* option should only be used to find matching strings, since this collation is not guaranteed to be in any particular order. This behaviour is described in <a href="https://unicode.org/reports/tr35/#UnicodeCollationIdentifier">Unicode Technical Standard #35 Part 1 Core, Unicode Collation Identifier</a> and <a href="https://unicode.org/reports/tr10/#Searching">Unicode Technical Standard #10 Unicode Collation Algorithm, Searching and Matching</a>.
</emu-note>
</emu-clause>
<emu-clause id="sec-properties-of-the-intl-collator-constructor">
<h1>Properties of the Intl.Collator Constructor</h1>
<p>
The Intl.Collator constructor has the following properties:
</p>
<emu-clause id="sec-intl.collator.prototype">
<h1>Intl.Collator.prototype</h1>
<p>
The value of `Intl.Collator.prototype` is %Intl.Collator.prototype%.
</p>
<p>
This property has the attributes { [[Writable]]: *false*, [[Enumerable]]: *false*, [[Configurable]]: *false* }.
</p>
</emu-clause>
<emu-clause id="sec-intl.collator.supportedlocalesof">
<h1>Intl.Collator.supportedLocalesOf ( _locales_ [ , _options_ ] )</h1>
<p>
When the `supportedLocalesOf` method is called with arguments _locales_ and _options_, the following steps are taken:
</p>
<emu-alg>
1. Let _availableLocales_ be %Intl.Collator%.[[AvailableLocales]].
1. Let _requestedLocales_ be ? CanonicalizeLocaleList(_locales_).
1. Return ? FilterLocales(_availableLocales_, _requestedLocales_, _options_).
</emu-alg>
</emu-clause>
<emu-clause id="sec-intl-collator-internal-slots">
<h1>Internal slots</h1>
<p>
The value of the [[AvailableLocales]] internal slot is implementation-defined within the constraints described in <emu-xref href="#sec-internal-slots"></emu-xref>. The value of the [[RelevantExtensionKeys]] internal slot is a List that must include the element *"co"*, may include any or all of the elements *"kf"* and *"kn"*, and must not include any other elements.
</p>
<emu-note>
<a href="https://unicode.org/reports/tr35/#Key_And_Type_Definitions_">Unicode Technical Standard #35 Part 1 Core, Section 3.6.1 Key and Type Definitions</a> describes ten locale extension keys that are relevant to collation: *"co"* for collator usage and specializations, *"ka"* for alternate handling, *"kb"* for backward second level weight, *"kc"* for case level, *"kf"* for case first, *"kh"* for hiragana quaternary, *"kk"* for normalization, *"kn"* for numeric, *"kr"* for reordering, *"ks"* for collation strength, and *"vt"* for variable top. Collator, however, requires that the usage is specified through the *"usage"* property of the options object, alternate handling through the *"ignorePunctuation"* property of the options object, and case level and the strength through the *"sensitivity"* property of the options object. The *"co"* key in the language tag is supported only for collator specializations, and the keys *"kb"*, *"kh"*, *"kk"*, *"kr"*, and *"vt"* are not allowed in this version of the Internationalization API. Support for the remaining keys is implementation dependent.
</emu-note>
<p>
The values of the [[SortLocaleData]] and [[SearchLocaleData]] internal slots are implementation-defined within the constraints described in <emu-xref href="#sec-internal-slots"></emu-xref> and the following additional constraints, for all locale values _locale_:
</p>
<ul>
<li>The first element of [[SortLocaleData]].[[<_locale_>]].[[co]] and [[SearchLocaleData]].[[<_locale_>]].[[co]] must be *null*.</li>
<li>The values *"standard"* and *"search"* must not be used as elements in any [[SortLocaleData]].[[<_locale_>]].[[co]] and [[SearchLocaleData]].[[<_locale_>]].[[co]] List.</li>
<li>[[SearchLocaleData]].[[<_locale_>]] must have a [[sensitivity]] field with one of the String values *"base"*, *"accent"*, *"case"*, or *"variant"*.</li>
<li>[[SearchLocaleData]].[[<_locale_>]] and [[SortLocaleData]].[[<_locale_>]] must have an [[ignorePunctuation]] field with a Boolean value.</li>
</ul>
</emu-clause>
</emu-clause>
<emu-clause id="sec-properties-of-the-intl-collator-prototype-object">
<h1>Properties of the Intl.Collator Prototype Object</h1>
<p>
The Intl.Collator prototype object is itself an ordinary object. <dfn>%Intl.Collator.prototype%</dfn> is not an Intl.Collator instance and does not have an [[InitializedCollator]] internal slot or any of the other internal slots of Intl.Collator instance objects.
</p>
<emu-clause id="sec-intl.collator.prototype.constructor">
<h1>Intl.Collator.prototype.constructor</h1>
<p>
The initial value of `Intl.Collator.prototype.constructor` is %Intl.Collator%.
</p>
</emu-clause>
<emu-clause id="sec-intl.collator.prototype.resolvedoptions">
<h1>Intl.Collator.prototype.resolvedOptions ( )</h1>
<p>
This function provides access to the locale and options computed during initialization of the object.
</p>
<emu-alg>
1. Let _collator_ be the *this* value.
1. Perform ? RequireInternalSlot(_collator_, [[InitializedCollator]]).
1. Let _options_ be OrdinaryObjectCreate(%Object.prototype%).
1. For each row of <emu-xref href="#table-collator-resolvedoptions-properties"></emu-xref>, except the header row, in table order, do
1. Let _p_ be the Property value of the current row.
1. Let _v_ be the value of _collator_'s internal slot whose name is the Internal Slot value of the current row.
1. If the current row has an Extension Key value, then
1. Let _extensionKey_ be the Extension Key value of the current row.
1. If %Intl.Collator%.[[RelevantExtensionKeys]] does not contain _extensionKey_, then
1. Set _v_ to *undefined*.
1. If _v_ is not *undefined*, then
1. Perform ! CreateDataPropertyOrThrow(_options_, _p_, _v_).
1. Return _options_.
</emu-alg>
<emu-table id="table-collator-resolvedoptions-properties">
<emu-caption>Resolved Options of Collator Instances</emu-caption>
<table class="real-table">
<thead>
<tr>
<th>Internal Slot</th>
<th>Property</th>
<th>Extension Key</th>
</tr>
</thead>
<tr>
<td>[[Locale]]</td>
<td>*"locale"*</td>
<td></td>
</tr>
<tr>
<td>[[Usage]]</td>
<td>*"usage"*</td>
<td></td>
</tr>
<tr>
<td>[[Sensitivity]]</td>
<td>*"sensitivity"*</td>
<td></td>
</tr>
<tr>
<td>[[IgnorePunctuation]]</td>
<td>*"ignorePunctuation"*</td>
<td></td>
</tr>
<tr>
<td>[[Collation]]</td>
<td>*"collation"*</td>
<td></td>
</tr>
<tr>
<td>[[Numeric]]</td>
<td>*"numeric"*</td>
<td>*"kn"*</td>
</tr>
<tr>
<td>[[CaseFirst]]</td>
<td>*"caseFirst"*</td>
<td>*"kf"*</td>
</tr>
</table>
</emu-table>
</emu-clause>
<emu-clause id="sec-intl.collator.prototype.compare">
<h1>get Intl.Collator.prototype.compare</h1>
<p>
This named accessor property returns a function that compares two strings according to the sort order of this Collator object.
</p>
<p>
Intl.Collator.prototype.compare is an accessor property whose set accessor function is *undefined*. Its get accessor function performs the following steps:
</p>
<emu-alg>
1. Let _collator_ be the *this* value.
1. Perform ? RequireInternalSlot(_collator_, [[InitializedCollator]]).
1. If _collator_.[[BoundCompare]] is *undefined*, then
1. Let _F_ be a new built-in function object as defined in <emu-xref href="#sec-collator-compare-functions"></emu-xref>.
1. Set _F_.[[Collator]] to _collator_.
1. Set _collator_.[[BoundCompare]] to _F_.
1. Return _collator_.[[BoundCompare]].
</emu-alg>
<emu-note>
The returned function is bound to _collator_ so that it can be passed directly to `Array.prototype.sort` or other functions.
</emu-note>
<emu-clause id="sec-collator-compare-functions">
<h1>Collator Compare Functions</h1>
<p>A Collator compare function is an anonymous built-in function that has a [[Collator]] internal slot.</p>
<p>When a Collator compare function _F_ is called with arguments _x_ and _y_, the following steps are taken:</p>
<emu-alg>
1. Let _collator_ be _F_.[[Collator]].
1. Assert: _collator_ is an Object and _collator_ has an [[InitializedCollator]] internal slot.
1. If _x_ is not provided, let _x_ be *undefined*.
1. If _y_ is not provided, let _y_ be *undefined*.
1. Let _X_ be ? ToString(_x_).
1. Let _Y_ be ? ToString(_y_).
1. Return CompareStrings(_collator_, _X_, _Y_).
</emu-alg>
<p>The *"length"* property of a Collator compare function is *2*<sub>𝔽</sub>.</p>
</emu-clause>
<emu-clause id="sec-collator-comparestrings" type="implementation-defined abstract operation">
<h1>
CompareStrings (
_collator_: an Intl.Collator,
_x_: a String,
_y_: a String,
): a Number, but not *NaN*
</h1>
<dl class="header">
<dt>description</dt>
<dd>
The returned Number represents the result of an implementation-defined locale-sensitive String comparison of _x_ with _y_.
The result is intended to correspond with a sort order of String values according to the effective locale and collation options of _collator_, and will be negative when _x_ is ordered before _y_, positive when _x_ is ordered after _y_, and zero in all other cases (representing no relative ordering between _x_ and _y_).
String values must be interpreted as UTF-16 code unit sequences as described in es2025, <emu-xref href="#sec-ecmascript-language-types-string-type"></emu-xref>, and a surrogate pair (a code unit in the range 0xD800 to 0xDBFF followed by a code unit in the range 0xDC00 to 0xDFFF) within a string must be interpreted as the corresponding code point.
</dd>
</dl>
<p>
Behaviour as described below depends upon locale-sensitive identification of the sequence of collation elements for a string, in particular "base letters", and different base letters always compare as unequal (causing the strings containing them to also compare as unequal). Results of comparing variations of the same base letter with different case, diacritic marks, or potentially other aspects further depends upon _collator_.[[Sensitivity]] as follows:
</p>
<emu-table id="table-collator-comparestrings-sensitivity">
<emu-caption>Effects of Collator Sensitivity</emu-caption>
<table class="real-table">
<thead style="white-space: nowrap">
<tr>
<th>[[Sensitivity]]</th>
<th>Description</th>
<th>*"a"* vs. *"á"*</th>
<th>*"a"* vs. *"A"*</th>
</tr>
</thead>
<tr>
<td>*"base"*</td>
<td>Characters with the same base letter do not compare as unequal, regardless of differences in case and/or diacritic marks.</td>
<td>equal</td>
<td>equal</td>
</tr>
<tr>
<td>*"accent"*</td>
<td>Characters with the same base letter compare as unequal only if they differ in accents and/or other diacritic marks, regardless of differences in case.</td>
<td>not equal</td>
<td>equal</td>
</tr>
<tr>
<td>*"case"*</td>
<td>Characters with the same base letter compare as unequal only if they differ in case, regardless of differences in accents and/or other diacritic marks.</td>
<td>equal</td>
<td>not equal</td>
</tr>
<tr>
<td>*"variant"*</td>
<td>Characters with the same base letter compare as unequal if they differ in case, diacritic marks, and/or potentially other differences.</td>
<td>not equal</td>
<td>not equal</td>
</tr>
</table>
</emu-table>
<emu-note>
The mapping from input code points to base letters can include arbitrary contractions, expansions, and collisions, including those that apply special treatment to certain characters with diacritic marks. For example, in Swedish, "ö" is a base letter that differs from "o", and "v" and "w" are considered to be the same base letter. In Slovak, "ch" is a single base letter, and in English, "æ" is a sequence of base letters starting with "a" and ending with "e".
</emu-note>
<p>
If _collator_.[[IgnorePunctuation]] is *true*, then punctuation is ignored (e.g., strings that differ only in punctuation compare as equal).
</p>
<p>
For the interpretation of options settable through locale extension keys, see <a href="https://unicode.org/reports/tr35/#Key_And_Type_Definitions_">Unicode Technical Standard #35 Part 1 Core, Section 3.6.1 Key and Type Definitions</a>.
</p>
<p>
The actual return values are implementation-defined to permit encoding additional information in them, but this operation for any given _collator_, when considered as a function of _x_ and _y_, is required to be a consistent comparator defining a total ordering on the set of all Strings. This operation is also required to recognize and honour canonical equivalence according to the Unicode Standard, including returning *+0*<sub>𝔽</sub> when comparing distinguishable Strings that are canonically equivalent.
</p>
<emu-note>
It is recommended that the CompareStrings abstract operation be implemented following <a href="https://unicode.org/reports/tr10/">Unicode Technical Standard #10: Unicode Collation Algorithm</a>, using tailorings for the effective locale and collation options of _collator_. It is recommended that implementations use the tailorings provided by the Common Locale Data Repository (available at <a href="https://cldr.unicode.org/">https://cldr.unicode.org/</a>).
</emu-note>
<emu-note>
Applications should not assume that the behaviour of the CompareStrings abstract operation for Collator instances with the same resolved options will remain the same for different versions of the same implementation.
</emu-note>
</emu-clause>
</emu-clause>
<emu-clause id="sec-intl.collator.prototype-%symbol.tostringtag%" oldids="sec-intl.collator.prototype-@@tostringtag">
<h1>Intl.Collator.prototype [ %Symbol.toStringTag% ]</h1>
<p>
The initial value of the %Symbol.toStringTag% property is the String value *"Intl.Collator"*.
</p>
<p>
This property has the attributes { [[Writable]]: *false*, [[Enumerable]]: *false*, [[Configurable]]: *true* }.
</p>
</emu-clause>
</emu-clause>
<emu-clause id="sec-properties-of-intl-collator-instances">
<h1>Properties of Intl.Collator Instances</h1>
<p>
Intl.Collator instances are ordinary objects that inherit properties from %Intl.Collator.prototype%.
</p>
<p>
Intl.Collator instances have an [[InitializedCollator]] internal slot.
</p>
<p>
Intl.Collator instances also have several internal slots that are computed by the constructor:
</p>
<ul>
<li>[[Locale]] is a String value with the language tag of the locale whose localization is used for collation.</li>
<li>[[Usage]] is one of the String values *"sort"* or *"search"*, identifying the collator usage.</li>
<li>[[Sensitivity]] is one of the String values *"base"*, *"accent"*, *"case"*, or *"variant"*, identifying the collator's sensitivity.</li>
<li>[[IgnorePunctuation]] is a Boolean value, specifying whether punctuation should be ignored in comparisons.</li>
<li>[[Collation]] is a String value with the *"type"* given in <a href="https://unicode.org/reports/tr35/#Key_And_Type_Definitions_">Unicode Technical Standard #35 Part 1 Core, Section 3.6.1 Key and Type Definitions</a> for the collation, except that the values *"standard"* and *"search"* are not allowed, while the value *"default"* is allowed.</li>
</ul>
<p>
Intl.Collator instances also have the following internal slots if the key corresponding to the name of the internal slot in <emu-xref href="#table-collator-resolvedoptions-properties"></emu-xref> is included in the [[RelevantExtensionKeys]] internal slot of Intl.Collator:
</p>
<ul>
<li>[[Numeric]] is a Boolean value, specifying whether numeric sorting is used.</li>
<li>[[CaseFirst]] is one of the String values *"upper"*, *"lower"*, or *"false"*.</li>
</ul>
<p>
Finally, Intl.Collator instances have a [[BoundCompare]] internal slot that caches the function returned by the compare accessor (<emu-xref href="#sec-intl.collator.prototype.compare"></emu-xref>).
</p>
</emu-clause>
</emu-clause>