Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shifted option for variable collation elements #19

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

lucafavatella
Copy link

@lucafavatella lucafavatella commented Oct 25, 2017

Depends on #18 (included in this PR).
Supersedes #15 #16 #17.

Notes:

  1. I am a bit concerned that the conformance tests doc says

    Compare that string with the string on the previous line, according to the UCA implementation, with strength = identical level (using S3.10).

    but tests succeed without having implemented the identical level at all... so the conformance tests currently do not really need the identical level.

  2. What is more our tests calling the conformance tests keep strength to default, that is tertiary and not quaternary. And conformance tests pass - even the shifted ones. But based on common settings combinations doc

    “Ignore punctuation” (completely): strength=tertiary alternate=shifted

    that suggests that the conformance tests for shifted do not test the punctuation... or the code actually handles the shifted ternary as quaternary (i.e. keeping L4).

  3. I am pretty confident that some other language / language library already has Unicode tests further to the UCA conformance tests by Unicode. We should find and import such extra tests.

lucafavatella and others added 8 commits October 24, 2017 10:22
Extracted from [`settings`
branch](jtauber@cf393b8)
by James Tauber.
Extracted - with amendments and integrations - from `settings` branch
by James Tauber:
* Settings initialization:
  * Commit ["beginnings of settings capability"](jtauber@cf393b8);
  * Commit ["support for strength setting"](jtauber@fb87e92).
* Documentation:
  * Commit ["initial settings documentation"](jtauber@8bd6a8c).
* `normalization` setting:
  * Commit "support for normalization setting"](jtauber@6187dd5).

Normalization test taken from UCA 5.2.0 conformance tests:
```
CollationTest/5.2.0/CollationTest_NON_IGNORABLE.txt:312:0332 0334;      # ...
```
Extracted - with minor amendments in tests - from `settings` branch by
James Tauber:
* Code:
  * Commit ["support for strength setting"](jtauber@fb87e92).
* Documentation:
  * Commit ["initial settings documentation"](jtauber@8bd6a8c).

----

Wrt the amended tests:

According to [the algorithm for computing the sort
key](https://www.unicode.org/reports/tr10/tr10-36.html#Step_3), the
sort key does not have to end with a level separator.  See also the
following examples:
* https://www.unicode.org/reports/tr10/tr10-36.html#Array_To_Sort_Key_Table
* https://www.unicode.org/reports/tr10/tr10-36.html#Comparison_Of_Sort_Keys_Table

However [the conformance
tests](https://www.unicode.org/Public/UCA/10.0.0/CollationTest.html)
include in (non-normative) comments "a representation of the sort key"
always ending with "|]" where the vertical bar stands for "the ZERO
separator".

Do not append a final unnecessary level separator.
... i.e. algorithm step
http://www.unicode.org/reports/tr10/tr10-36.html#S2.3

In order to achieve that, this commit extracts info on variable
collation elements from allkeys data file as specified at
https://www.unicode.org/reports/tr10/tr10-36.html#File_Format

Description of conformance test is at
https://www.unicode.org/Public/UCA/10.0.0/CollationTest.html

Sample comparison of non-ignorable vs. shifted ordering is at
http://www.unicode.org/reports/tr10/tr10-36.html#Variable_Weighting_Examples

Intended meaning of combination of variable weighting (e.g. shifted)
and strength is described in the following locations:
* https://www.unicode.org/reports/tr10/tr10-36.html#Parametic_Tailoring
* https://www.unicode.org/reports/tr35/tr35-collation.html#Common_Settings
@lucafavatella lucafavatella changed the title Shifted Shifted option for variable collation elements Oct 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants