Skip to content

Handle the sr-Cyrl-ME -> sr-ME collation-only fallback#7867

Merged
robertbastian merged 1 commit intounicode-org:mainfrom
robertbastian:sr-ME
Apr 14, 2026
Merged

Handle the sr-Cyrl-ME -> sr-ME collation-only fallback#7867
robertbastian merged 1 commit intounicode-org:mainfrom
robertbastian:sr-ME

Conversation

@robertbastian
Copy link
Copy Markdown
Member

@robertbastian robertbastian commented Apr 13, 2026

Fixes #3287

This handles the sr-Cyrl-ME -> sr-ME fallback the same way we handle the other collation-only fallbacks: by explicitly adding the data for sr-Cyrl-ME so we don't go through the default fallback mechanism for that locale.

Changelog

N/A

@robertbastian robertbastian requested review from a team, Manishearth and sffc as code owners April 13, 2026 09:23
Copy link
Copy Markdown
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice for the test to actually run the Collator constructor but it is a bit tricky... good for a follow-up.

@robertbastian robertbastian merged commit 0f348cc into unicode-org:main Apr 14, 2026
34 checks passed
@robertbastian robertbastian deleted the sr-ME branch April 14, 2026 15:37
@hsivonen
Copy link
Copy Markdown
Member

Do I understand correctly that explicitly asking for Cyrillic collation for Montenegrin now results in the Latin collation similarly to how asking for Cyrillic collation for Croatian results in the Latin collation ... but unlike Bosnian and Serbian, which do allow explicit request for the non-default script.

Why? What's the upstream CLDR issue motivating this change?

For reference, here's the Firefox/SpiderMonkey test case that documents the situation before this PR: https://searchfox.org/firefox-main/rev/23974e2d947e31e4ae42ae2758a4416c9a6d8671/js/src/tests/non262/Intl/Collator/bcms.js

Also, AFAICT, there is no technical reason why we couldn't merge the Latin and Cyrillic collation data for Bosnian-Croatian-Montenegrin-Serbian and make Latn vs. Cyrl a matter of script reordering on top.

Are users of Bosnian-Croatian-Montenegrin-Serbian actually better served by having the other script collate according to root as opposed to having the other script also collate according to language-specific rules?

@robertbastian
Copy link
Copy Markdown
Member Author

robertbastian commented Apr 15, 2026

Apparently these come from upstreaming ICU behaviour to CLDR: unicode-org/cldr#2664, unicode-org/cldr#3504.

I believe it was initially added to ICU in icu4c/source/data/icu-coll-deprecates.xml, which was commited as "Merge CLDR25 data into trunk". However, I cannot find any reference to sr_Cyrl_ME in CLDR 25, so I believe that file was handwritten. sr_ME is listed with the other sr variants there (and other multi-script languages), and it's the only one where the added script tag is not the likely one1. It looks suspiciously like a typo.

The initial aliases from that file have since evolved through

and the ones that are just likely subtags have disappeared, leaving just sr-Cyrl-ME -> sr-ME. Along the way, helpful comments like

It is not at all clear why this is being done (we expect "sr_Latn_ME" normally).

have been added and removed. It still says

TODO: Find out and document this properly

today, but that work is not being tracked anywhere, and apparently wasn't enough to have someone look at this before upstreaming it into CLDR.

Footnotes

  1. Note that Cyrl was the likely script for sr-ME until CLDR-2203, but that was way before CLDR 25

robertbastian added a commit to robertbastian/icu4x that referenced this pull request Apr 15, 2026
robertbastian added a commit that referenced this pull request Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consume the new component parents from CLDR JSON

3 participants