Skip to content

Conversation

@sffc
Copy link
Member

@sffc sffc commented Dec 9, 2025

CLDR-17223

  • This PR completes the ticket.

I'm doing just English to start. I already found an odd case: we have "Cocos (Keeling) Islands" which isn't easily constructible from a menu glue pattern.

Please give feedback and suggestions. I would like to land this change relatively quickly because it blocks ICU4X.

@sffc sffc requested a review from macchiati December 9, 2025 23:29
Copy link
Contributor

@conradarcturus conradarcturus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be to just drop the extensions: English (Myanmar) not English (Myanmar [Burma]) or English (Myanmar (Burma)). So I definitely favor the addition of menu core/extension values for Myanmar and Cocos Islands. Nonetheless, I'm happy to accept your solution since you are closer to the problem space.

I commented on an error you'll need to fix to run the CLDR modify script.

Btw I'm a bit confused, because in the test data it shows en-MM; English (Myanmar [Burma]) already -- before this change. I suspect there may be something hard-coded that is already doing the nested brackets.

https://github.com/unicode-org/cldr/blob/main/common/testData/localeIdentifiers/localeDisplayName.txt#L923-L933

Make sure to update the display name documentation:

https://github.com/unicode-org/cldr/blob/main/docs/ldml/tr35-general.md?plain=1#L135-L163

Comment on lines +1014 to +1015
<territory type="CC" menu="core">Cocos Islands</territory>
<territory type="CC" menu="extension">Keeling</territory>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good solution. Wikipedia seems to generally follow this pattern too.

https://en.wikipedia.org/wiki/Cocos_(Keeling)_Islands
Image

<!--@METADATA-->
<!--@DEPRECATED-->

<!ELEMENT localeDisplayPattern ( alias | ( localePattern*, localeSeparator*, localeKeyTypePattern*, special* ) ) >
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to add the valid child here.

Suggested change
<!ELEMENT localeDisplayPattern ( alias | ( localePattern*, localeNestedPattern*, localeSeparator*, localeKeyTypePattern*, special* ) ) >

@sffc
Copy link
Member Author

sffc commented Dec 11, 2025

My preference would be to just drop the extensions: English (Myanmar) not English (Myanmar [Burma]) or English (Myanmar (Burma)). So I definitely favor the addition of menu core/extension values for Myanmar and Cocos Islands. Nonetheless, I'm happy to accept your solution since you are closer to the problem space.

So, in the specific case of Myanmar (Burma), maybe it's time to just drop the parenthetical. But, there are hundreds of other cases. I'm seeking to find the correct general solution here. Here are more examples:

Alternative names for the same territory:

  • Falkland Islands (Islas Malvinas)
  • Aotearoa (Nouvelle-Zélande)
  • Congo (RDC)

Clarification about who owns a particular territory:

  • Виргинские о-ва (США)
  • Макао (САР)

So unless we want to use a pattern that avoids the parentheses entirely (which I'm open to exploring), we need to answer what happens with the nested parentheses.

I commented on an error you'll need to fix to run the CLDR modify script.

Fixed, thanks!

Btw I'm a bit confused, because in the test data it shows en-MM; English (Myanmar [Burma]) already -- before this change. I suspect there may be something hard-coded that is already doing the nested brackets.

https://github.com/unicode-org/cldr/blob/main/common/testData/localeIdentifiers/localeDisplayName.txt#L923-L933

There's hacky code somewhere that does a string substitution for '(' to '['. UTS 35 says:

When the display name contains "(" or ")" characters (or full-width equivalents), replace them by "[", "]" (or full-width equivalents) before adding.

https://unicode.org/reports/tr35/tr35-general.html#locale_display_name_algorithm

I claim that this is terrible for both quality and implementability, and I want to improve it.

Make sure to update the display name documentation:

https://github.com/unicode-org/cldr/blob/main/docs/ldml/tr35-general.md?plain=1#L135-L163

Yep, I'll work on that once we have alignment on the approach.

@sffc
Copy link
Member Author

sffc commented Dec 11, 2025

If we consider the contents of the parenthetical to be "optional", another approach could be to include the parenthetical when formatting a region display name, but drop it when formatting a locale display name.

new Intl.DisplayNames("en", { type: "region" }).of("MM")
// => "Myanmar (Burma)"

new Intl.DisplayNames("en", { type: "language" }).of("en-MM")
// => "English (Myanmar)"
//    NOT "English (Myanmar [Burma])" ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants